Based on substrate sequences, we proposed a novel method for comparing sequence similarities among 68 proteases compiled from the MEROPS online database. The rank vector was defined based on the frequencies of amino acids at each site of the substrate, aiming to eliminate the different order variances of magnitude between proteases. Without any assumption on homology, a protease specificity tree is constructed with a striking clustering of proteases from different evolutionary origins and catalytic types. Compared with other methods, almost all the homologous proteases are clustered in small branches in our phylogenetic tree, and the proteases belonging to the same catalytic type are also clustered together, which may reflect the genetic relationship among the proteases. Meanwhile, certain proteases clustered together may play a similar role in key pathways categorized using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Consequently, this method can provide new insights into the shared similarities among proteases. This may inspire the design and development of targeted drugs that can specifically regulate protease activity.
Citation: Enfeng Qi, Can Fu, Ying Zhai, Jianghui Dong. Insights into protease sequence similarities by comparing substrate sequences and phylogenetic dynamics[J]. Mathematical Biosciences and Engineering, 2021, 18(1): 837-850. doi: 10.3934/mbe.2021044
Related Papers:
[1]
Xiaoling Chen, Xingfa Zhang, Yuan Li, Qiang Xiong .
Daily LGARCH model estimation using high frequency data. Data Science in Finance and Economics, 2021, 1(2): 165-179.
doi: 10.3934/DSFE.2021009
[2]
Paarth Thadani .
Financial forecasting using stochastic models: reference from multi-commodity exchange of India. Data Science in Finance and Economics, 2021, 1(3): 196-214.
doi: 10.3934/DSFE.2021011
[3]
Kexian Zhang, Min Hong .
Forecasting crude oil price using LSTM neural networks. Data Science in Finance and Economics, 2022, 2(3): 163-180.
doi: 10.3934/DSFE.2022008
[4]
Moses Khumalo, Hopolang Mashele, Modisane Seitshiro .
Quantification of the stock market value at risk by using FIAPARCH, HYGARCH and FIGARCH models. Data Science in Finance and Economics, 2023, 3(4): 380-400.
doi: 10.3934/DSFE.2023022
[5]
Samuel Asante Gyamerah, Collins Abaitey .
Modelling and forecasting the volatility of bitcoin futures: the role of distributional assumption in GARCH models. Data Science in Finance and Economics, 2022, 2(3): 321-334.
doi: 10.3934/DSFE.2022016
[6]
Alejandro Rodriguez Dominguez, Om Hari Yadav .
A causal interactions indicator between two time series using extreme variations in the first eigenvalue of lagged correlation matrices. Data Science in Finance and Economics, 2024, 4(3): 422-445.
doi: 10.3934/DSFE.2024018
[7]
Wojciech Kurylek .
Are Natural Language Processing methods applicable to EPS forecasting in Poland?. Data Science in Finance and Economics, 2025, 5(1): 35-52.
doi: 10.3934/DSFE.2025003
[8]
Nitesha Dwarika .
The risk-return relationship in South Africa: tail optimization of the GARCH-M approach. Data Science in Finance and Economics, 2022, 2(4): 391-415.
doi: 10.3934/DSFE.2022020
[9]
Mohamed F. Abd El-Aal .
Analysis Factors Affecting Egyptian Inflation Based on Machine Learning Algorithms. Data Science in Finance and Economics, 2023, 3(3): 285-304.
doi: 10.3934/DSFE.2023017
[10]
Xiaozheng Lin, Meiqing Wang, Choi-Hong Lai .
A modification term for Black-Scholes model based on discrepancy calibrated with real market data. Data Science in Finance and Economics, 2021, 1(4): 313-326.
doi: 10.3934/DSFE.2021017
Abstract
Based on substrate sequences, we proposed a novel method for comparing sequence similarities among 68 proteases compiled from the MEROPS online database. The rank vector was defined based on the frequencies of amino acids at each site of the substrate, aiming to eliminate the different order variances of magnitude between proteases. Without any assumption on homology, a protease specificity tree is constructed with a striking clustering of proteases from different evolutionary origins and catalytic types. Compared with other methods, almost all the homologous proteases are clustered in small branches in our phylogenetic tree, and the proteases belonging to the same catalytic type are also clustered together, which may reflect the genetic relationship among the proteases. Meanwhile, certain proteases clustered together may play a similar role in key pathways categorized using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Consequently, this method can provide new insights into the shared similarities among proteases. This may inspire the design and development of targeted drugs that can specifically regulate protease activity.
1.
Introduction
Estimating foreign exchange rate (FX) volatility is a core risk management activity for financial institutions, corporates and regulators. The subject has been extensively investigated among both practitioners and scientific researchers, and several alternative models exist. Among the most prominent are the models belonging to GARCH and stochastic volatility classes. However, the true value of volatility cannot be directly observed. Hence, volatility must be estimated, inevitably with error. This constitutes a fundamental problem in implementing parametric models, especially in the context of high-frequency data. Andersen and Bollerslev (1998) proposed using realized volatility, as derived from high-frequency data, to accurately measure the true latent integrated volatility. This approach has gained attention for volatility modeling in markets where tick-level data is available (Andersen et al., 2013). Andersen et al. (2003) suggest fractionally integrated ARFIMA models in this context. Still, the long-memory HAR (heterogeneous autoregressive) model of Corsi (2009) is arguably the most widely used to capture the high persistence typically observed in realized volatility of financial prices. The HAR model is relatively simple and easy to estimate. In empirical applications, the model tends to perform better than GARCH and stochastic volatility models possibly due to the sensitivity of tightly parameterized volatility models to minor model misspecifications (Sizova., 2011). Although realized volatility (RV) is a consistent estimator of the true latent volatility, it is subject to measurement error in empirical finite samples. Hence, RV will not only reflect the true latent integrated volatility (IV), but also additional measurement errors. Bollerslev et al. (2016) propose utilizing higher-order realized moments of the realized distribution to approximate these measurement errors. More specifically, Bollerslev et al. (2016) propose the HARQ-model, which augments the HAR model with realized quarticity as an additional covariate.
Empirical applications of the HARQ model in the context of foreign exchange rate risk are sparse. Lyócsa et al. (2016) find that the standard HAR model rarely is outperformed by less parsimonious specifications on CZKEUR, PLZEUR, and HUFEUR data. Plíhal et al. (2021) and Rokicka and Kudła (2020) estimate the HARQ model on EURUSD and EURGBP data, respectively. Their focus is different from ours, as they investigate the incremental predictive power of implied volatility for a broad class of HAR models. In a similar vein, Götz (2023) and Lyócsa et al. (2024) utilize the HARQ model for the purpose of estimating foreign exchange rate tail risk.
Using updated tick-level data from two major currency pairs, EURUSD and USDJPY, this paper documents the relevance of realized quarticity for improving volatility estimates across varying forecasting horizons. These results are robust across estimation windows, evaluation metrics, and model specifications.
2.
Materials and method
2.1. Data
We use high-frequency intraday ticklevel spot data, publicly available at DukasCopy* The sample period is 1. January 2010 to 31. December 2022. Liu et al. (2015) investigate the optimal intraday sampling frequency across a significant number of asset classes and find that 5-min intervals usually outperform others. Hence, as common in the literature, we estimate the realized volatility from 5-minute returns.
To filter tick-level data, we follow a two-step cleaning procedure based on the recommendations by Barndorff-Nielsen et al. (2009). Initially, we eliminate data entries that exhibit any of the following issues: (i) absence of quotes, (ii) a negative bid-ask spread, (iii) a bid-ask spread exceeding 50 times the median spread of the day, or (iv) a mid-quote deviation beyond ten mean absolute deviations from a centered mean (computed excluding the current observation from a window of 25 observations before and after). Following this, we calculate the mid-quotes as the average of the bid and ask quotes and then resample the data at 5-minute intervals.
We compute the consistent estimator of the true latent time-t variance from
RV2t≡M∑t=1r2t,i,
(1)
where M=1/Δ, and the Δ-period intraday return is rt,i≡log(St−1+i×Δ)−log(St−1+(i−1)×Δ), where S is the spot exchange rate. Analogously, the multi(h)-period realized variance estimator is
RV2t−1,t−h=1hh∑i=1RV2t−h.
(2)
Setting h=5 and h=22 yields weekly and monthly estimates, respectively.
Table 1 displays descriptive statistics for daily realized variances, as computed from (1).
Table 1.
Realized Variance (daily).
Min
Mean
Median
Max
ρ1
EURUSD
0.1746
3.0606
2.2832
59.4513
0.5529
USDJPY
0.1018
3.2460
2.0096
168.0264
0.2860
The table contains summary statistics for the daily RVs for EURUSD and USDJPY. ρ1 is the standard first order autocorrelation coefficient. Sample period: 1. January 2010 to 31. December 2022.
To represent the long-memory dynamic dependencies in volatility, Corsi (2009) proposed using daily, weekly, and monthly lags of realized volatility as covariates. The original HAR model is defined as
RVt=β0+β1RVt−1+β2RVt−1∣t−5+β3RVt−1∣t−22+ut,
(3)
where RV is computed from (1) and (2). If the variables in (2.2) contain measurement errors, the beta coefficients will be affected. Bollerslev et al. (2016) suggests two measures to alleviate this. First, they include a proxy for measurement error as an additional explanatory variable. Furthermore, they directly adjust the coefficients in proportion to the magnitude of the measurement errors:
The full HARQ model in (2.2) adjusts the coefficients on all lags of RV. A reasonable conjecture is that measurement errors in realized volatilities tend to diminish at longer forecast horizons, as these errors are diversified over time. This suggests that measurement errors in daily lagged realized volatilities are likely to be relatively more important. Motivated by this Bollerslev et al. (2016) specify the HARQ model as
Although there is no reason to expect that autoregressive models of order one will be able to accurately capture long memory in realized volatility, we estimate AR(1) models as a point of reference. The AR and ARQ models are defined as
RVt=β0+β1RVt−1+ut.
(6)
and
RVt=β0+(β1+β1QRQ1/2t−1)⏟β1,tRVt−1+ut.
(7)
in equations (6) and (7), respectively.
3.
Results and discussion
Due to noisy data and related estimation errors, forecasts from realized volatility models might occasionally appear as unreasonably high or low. Thus, in line with Swanson et al. (1997) and Bollerslev et al. (2016), we filter forecasts from all models so that any forecast outside the empirical distribution of the estimation sample is replaced by the sample mean.
3.1. In-sample estimation results
Table 2 reports in-sample parameter estimates for the ARQ, HARQ, and HARQ-F models, along with the benchmark AR and ARQ models, for one-day ahead EURUSD (upper panel) and USDJPY (lower panel) volatility forecasts. Robust standard errors (s.e.) are computed as proposed by White (1980). R2, MSE, and QLIKE are displayed at the bottom of each panel.
Note: The table contains in-sample parameter estimates and corresponding standard errors (White, 1980), together with R2. MSE and QLIKE computed from (12) and (13). Superscripts *, **, and *** represent statistical significance in a two-sided t-test at 1%, 5% and 10% levels, respectively.
The coefficients β1Q are negative and exhibit strong statistical significance, aligning with the hypothesis that RQ represents time-varying measurement error. When comparing the autoregressive (AR) coefficient of the AR model to the autoregressive parameters in the ARQ model, the AR coefficient is markedly lower, reflecting the difference in in persistence between the models.
In the comparative analysis of the HAR and HARQ models applied to both currency pairs, the HAR model assigns more emphasis to the weekly and monthly lags, which are generally less sensitive to measurement errors. In contrast, the HARQ model typically assigns a higher weight to the daily lag. However, when measurement errors are substantial, the HARQ model reduces the weight on the daily lag to accommodate the time-varying nature of the measurement errors in the daily realized volatility (RV). The flexible version of this model, the HARQ-F, allows for variability in the weekly and monthly lags, resulting in slightly altered parameters compared to the standard HARQ model. Notably, the coefficients β2Q and β3Q in the HARQ-F model are statistically significant, and this model demonstrates a modest enhancement in in-sample fit relative to the HARQ model.
3.2. Out-of-sample forecasting results
To further assess the out-of-sample performance of the HARQ model, we consider three alternative HAR type specifications. More specifically, we include both the HAR-with-Jumps (HAR-J) and the Continuous-HAR (CHAR) proposed by Andersen et al. (2007), as well as the SHAR model proposed by Patton and Sheppard (2015), in the forecasting comparisons. Based on the Bi-Power Variation (BPV) measure of Barndorff-Nielsen and Shephard (2004), HAR-J and CHAR decompose the total variation into a continuous and a discontinuous (jump) part.
The HAR-J model augments the standard HAR model with a measure of the jump variation;
where Jt≡max[RVt−BPVt,0], and the BPV measure is defined as,
BPVt≡μ−21M−1∑i=1|rt,i||rt,i+1|,
(9)
with μ1=√2/π=E(|Z|), and Z is a standard normal random variable.
The CHAR model includes measures of the continuous component of the total variation as covariates;
RVt=β0+β1BPVt−1+β2BPVt−1∣t−5+β3BPVt−1∣t−22+ut.
(10)
Inspired by the semivariation measures of Barndorff-Nielsen et al. (2008), Patton and Sheppard (2015) propose the SHAR model, which, in contrast to the HAR model, effectively allows for asymmetric responses in volatility forecasts from negative and positive intraday returns. More specifically, when RV−t≡∑Mi=1r2t,iI{rt,i<0} and RV+t≡∑Mi=1r2t,iI{rt,i>0}, the SHAR model is defined as:
To evaluate model performance, we consider the mean squared error (MSE) and the QLIKE loss, which, according to Patton (2011), both are robust to noise. MSE is defined as
MSE(RVt,Ft)≡(RVt−Ft)2,
(12)
where Ft refers to the one-period direct forecast. QLIKE is defined as
QLIKE(RVt,Ft)≡RVtFt−ln(RVtFt)−1.
(13)
3.2.1. Daily forecasting horizon
Table 3 contains one-day-ahead forecasts for EURUSD and USDJPY. The table reports model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. The lowest ratio on each row, highlighting the best, performing model, is in bold. We evaluate the models using both a rolling window (RW) and an expanding window (EW). In both cases, forecasts are derived from model parameters re-estimated each day with a fixed length RW comprised of the previous 1000 days, as well as an EW using all of the available observations. The sample sizes for EW thus range from 1000 to 3201 days. The results are consistent in that the HARQ-F model is the best performer for both currency pairs and across loss functions and estimation windows. The HARQ model is closest to HARQ-F. Neither HAR-J, CHAR, nor SHAR appear to consistently improve upon the standard HAR model.
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold. Corresponding asterix * and ** denote 1% and 5% confidence levels from Diebold-Mariano test for one-sided tests of superior performance of the best performing model compared to the HAR model.
Judging from Table 3, it is beneficial to include RQ as an explanatory variable when RV is measured inaccurately. However, precise measurement of RV becomes more difficult when RV is high, inducing a positive correlation between RV and RQ. At the same time, high RV often coincides with jumps. To clarify whether the performance of RQ-based models is due to jump dynamics, Table 4 further segments the results in Table 3 into forecasts for days when the previous day's RQ was very high (Top 5% RQ, Table 4b) and the remaining sample (Bottom 95% RQ, Table 4a). As this breakdown shows, the RQ-based models perform relatively well also during periods of non-extreme heteroscedasticity of RQ.
Note: The table segments the results in Table 3 according to RQ. The bottom panel shows the ratios for days following a value of RQ in the top 5%. The top panel shows the results for the remaining 95% of sample. Ratio for the best performing model on each row in bold.
In practitioner applications, longer forecasts than one day are often of interest. We now extend our analysis to weekly and monthly horizons, using direct forecasts. The daily forecast analysis in subsubsection 3.2.1 indicates the lag order of RQ plays an important role in forecast accuracy. Hence, following Bollerslev et al. (2016), we consider the HARQ-h model, and adjust the lag corresponding to the specific forecast horizon only. Specifically, for the weekly and monthly forecasts analysed here, the relevant HARQ-h specifications become
Table 5 presents in-sample parameter estimates across model specifications. The patterns observed here closely resemble those of the daily estimates detailed in Table 2. All coefficients on RQ (β1Q,β2Q,β3Q) are negative, except for the (h=22) lag statistically significant. This indicates that capturing measurement errors is relevant also for forecast horizons beyond one day. The HARQ model consistently allocates greater weight to the daily lag compared to the standard HAR model. Similarly, the HARQ-h model predominantly allocates its weight towards the time-varying lag. The weights of the HARQ-F model on the different lags are relatively more stable when compared to the HARQ-h model.
Table 5.
In-sample weekly and monthly model estimates.
(a) EURUSD
Weekly
Monthly
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
β0
0.8646∗
0.2634∗
0.5680∗
0.4758∗
-0.0250
0.2275∗∗
1.6388∗
0.9642∗
0.9269∗
0.8452∗
0.2328
0.2153
s.e.
0.1345
0.0927
0.0997
0.0882
0.0895
0.0861
0.1806
0.1840
0.2246
0.2099
0.2080
0.2093
β1
0.7168∗
0.9620∗
0.1194∗
0.2752∗
0.1836∗
0.1181∗
0.4616∗
0.7373∗
0.0717∗
0.2097∗
0.1131∗
0.0646∗
s.e.
0.0480
0.0400
0.0264
0.0395
0.0269
0.0214
0.0564
0.0616
0.0205
0.0401
0.0248
0.0185
β2
0.3938∗
0.3395∗
0.5777∗
0.7635∗
0.2091∗
0.1606∗
0.3706∗
0.2176∗
s.e.
0.0887
0.0881
0.1282
0.1139
0.0587
0.0554
0.0962
0.0563
β3
0.3008∗
0.2440∗
0.3131∗∗
0.0876
0.4163∗
0.3661∗
0.5153∗
0.7179∗
s.e.
0.0880
0.0817
0.1275
0.0940
0.1186
0.1174
0.1498
0.1106
β1Q
−5.4876∗
−1.0749∗
−0.4728∗
−6.1534∗
−0.9499∗
−0.3246∗
s.e.
0.4817
0.1377
0.1005
0.9900
0.1815
0.0846
β2Q
−2.7357∗
−4.9739∗
−2.3111∗
s.e.
0.9302
0.7181
0.8020
β3Q
−5.6441∗
−7.8467∗
−10.9979∗
s.e.
1.4540
2.1071
1.9082
R2
0.5138
0.5642
0.5453
0.5604
0.5843
0.5756
0.4297
0.5191
0.5072
0.5237
0.5678
0.5568
MSE
2.6073
2.3370
2.4385
2.3576
2.2292
2.2759
2.1913
1.8477$
1.8932
1.8299
1.6606
1.7027
QLIKE
0.0862
0.0731
0.0752
0.0735
0.0679
0.0704
0.1073
0.0804
0.0839
0.0012
0.0760
0.0788
(b) USDJPY
Weekly
Monthly
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
β0
2.0305∗
1.1976*
1.3310∗
1.1591∗
0.8708∗
0.9646∗
2.5786∗
2.2815∗
1.7358∗
1.6356∗
1.3894∗
1.3900∗
s.e.
0.2484
0.1550
0.1967
0.1646
0.1701
0.1564
0.1928
0.2245
0.2792
0.2678
0.3106
0.3151
β1
0.3709∗
0.6801∗
0.0687∗
0.2722∗
0.1650∗
0.0668∗
0.2011*
0.3121∗
0.0286∗∗
0.1460∗
0.0829∗
0.0283∗
s.e.
0.0717
0.0512
0.0266
0.0500
0.0373
0.0207
0.0363
0.0566
0.0119
0.0258
0.0166
0.0113
β2
0.1294
0.0742
0.3558∗
0.4971∗
0.0865∗
0.0541
0.1886*
0.0923∗
s.e.
0.0700
0.0609
0.0787
0.0790
0.0389
0.0333
0.0487
0.0376
β3
0.3910∗
0.3147∗
0.2622∗
0.1829*
0.3460∗
0.3030∗
0.3340∗
0.4811∗
s.e.
0.0703
0.0621
0.0959
0.0693
0.0916
0.0883
0.1346
0.1220
β1Q
−0.6085∗
−0.1190∗
−0.0571∗
−0.2167∗∗
−0.0678∗
−0.0318∗
s.e.
0.0534
0.0173
0.0141
0.0832
0.0093
0.0068
β2Q
−0.3653∗
−0.5357∗
−0.1659∗
s.e.
0.0704
0.0648
0.0465
β3Q
-0.2750
-0.3946
−0.7392∗∗
s.e.
0.2010
0.2942
0.2900
R2
0.1367
0.2323
0.1848
0.2270
0.2557
0.2475
0.1414
0.2106
.2205
0.2496
0.2761
0.2542
MSE
11.6923
$ 10.3980
11.0412
10.4701
10.0811
10.1919
5.4365
4.9983
4.9351
4.7513
4.58326
4.7220
QLIKE
0.2361
0.4197
0.2057
0.1937
0.4076
0.1405
0.2143
0.1973
0.1801
0.1734
0.1634
0.1680
Note: In-sample parameter estimates for weekly (h=5) and monthly (h=22) forecasting models. EURUSD in upper panel (Table 5a) and USDJPY in lower panel (Table 5b). Robust standard errors (s.e.) using Newey and West (1987) accommodate autocorrelation up to order 10 (h=5), and 44 (h=22), respectively. Superscripts *, ** and *** represent statistical significance in a two-sided t-test at 1%, 5%, and 10% levels.
Table 6 and Table 7 detail the out-of-sample performance for weekly and monthly forecasts, respectively. Notably, the HAR-J, CHAR, and SHAR models generally fail to demonstrate consistent improvements over the basic HAR model. This is a sharp contrast to the RQ-augmented models. The HARQ-F model outperforms the HAR model both for EURUSD and USDJPY for nearly all instances. Also, HARQ-h delivers forecasts that are relatively consistent with the HAR model. Judging from both weekly and monthly results, the inherent flexibility of the HARQ-F is beneficial also for longer-term forecasts. We note that, at the monthly forecasting horizon for USDJPY, there is some variability as to preferred Q-specifications. Also, in some monthly instances, the Diebold-Mariano null hypothesis of equal predictability cannot be rejected. This is not unreasonable, since the number of independent monthly observations naturally becomes lower than for corresponding shorter forecasting horizons, leading to higher parameter uncertainty and related noise in volatility estimates.
Table 6.
Weekly out-of-sample forecast losses.
EURUSD
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
HARQ-h
MSE-RW
1.3063
1.0000
0.9636
0.9884
1.0017
1.1459
0.9677
0.9024*
0.9205
MSE-EW
1.2702
1.0000
0.9433
0.9559
0.9997
1.1288
0.9501
0.8996*
0.9117
QLIKE-RW
1.5923
1.0000
0.9819
0.9840
0.9995
1.3558
0.9932
0.8701
0.9283
QLIKE-EW
1.7682
1.0000
0.9874
1.0031
1.0033
1.4134
0.9648
0.8832*
0.9297
USDJPY
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
HARQ-h
MSE-RW
1.0618
1.0000
0.9464
0.9509
0.9965
0.9064
0.8971
0.8393*
0.8443
MSE-EW
1.1707
1.0000
1.0148
1.0021
1.0336
1.0194
0.9388
0.8993
0.8976*
QLIKE-RW
1.3119
1.0000
1.0057
0.9910
0.9740
1.0493
0.9099
0.8246*
0.8359
QLIKE-EW
1.3847
1.0000
0.9918
0.9768
1.0002
1.1391
0.9179
0.8350*
0.8463
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best-performing model on each row in bold. Corresponding asterix * and ** denote 1% and 5% confidence levels from Diebold-Mariano test for one-sided tests of superior performance of the best performing model compared to the HAR model.
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold. Corresponding asterix * and ** denote 1% and 5% confidence levels from Diebold-Mariano test for one-sided tests of superior performance of the best performing model compared to the HAR model.
The intention of the HARQ model is to capture the heteroskedastic measurement error of realized variance. The HARQ model in (5) approximates this through the square root of RQ. Bollerslev et al. (2016) argues that this encounters possible issues with numerical stability. Still, this specification is somewhat ad-hoc and a number of reasonable alternatives exist. To clarify whether the performance of the HARQ model is sensitive to the definition of RQ, we follow Bollerslev et al. (2016) and substitute RQ,RQ−1/2,RQ−1, and log(RQ) in place of RQ1/2. Furthermore, we augment the standard HAR and HARQ models with RQ1/2 as an additional explanatory variable, which allows the HAR(Q) model intercept to be time-varying.
Table 8 reports the out-of-sample forecast results from the alternative HARQ specifications. We normalize all losses by those of the HARQ model based on RQ1/2.
Table 8.
Alternative HARQ Specifications.
Alternative RQ transformations
Adding RQ1/2
EURUSD
RQ
RQ1/2
RQ−1/2
RQ−1
log(RQ)
HAR
HARQ
MSE-RW
1.0023
1.0000
1.0263
1.0246
1.0092
1.0309
1.0052
MSE-IW
1.0016
1.0000
1.0274
1.0265
1.0069
1.0292
1.0086
QLIKE-RW
1.0042
1.0000
1.0326
1.0304
1.0007
1.0250
1.0067
QLIKE-IW
1.0014
1.0000
1.0064
1.0254
0.9937
1.0044
1.0164
USDJPY
RQ
RQ1/2
RQ−1/2
RQ−1
log(RQ)
HAR
HARQ
MSE-RW
1.0001
1.0000
1.1345
1.1225
1.0516
1.1202
1.0118
MSE-IW
1.0049
1.0000
1.0606
1.0543
0.9931
1.0512
1.0186
QLIKE-RW
1.0097
1.0000
1.1439
1.1067
0.9794
1.0731
1.0455
QLIKE-IW
1.0188
1.0000
1.1105
1.0841
0.9322
1.0358
0.9989
Note: Model performance, expressed as model loss normalized by the loss of the HARQ model, relies on RQ1/2. Each row reflects a combination of estimation window and loss function. Ratio for the best, performing model on each row in bold. The left panel reports the results based on alternative RQ interaction terms. The right panel reports the results from including RQ1/2 as an explanatory variable.
The two rightmost columns of Table 8 reveal that including RQ1/2 as an explanatory variable in the HAR and HARQ models does not lead to improved forecasts. Similarly, applying alternative RQ transformations does not appear to be helpful. Overall, we conclude that the HARQ model demonstrates greater stability and is generally favored over the alternative specifications.
3.3.2. Alternative Q-Models
HARQ is essentially an expansion of the HAR model. In a similar vein, the other benchmark volatility models can be extended accordingly. Following Bollerslev et al. (2016), from the HAR-J model defined in (3.2), we construct the HARQ-J model;
Table 9 compares out-of-sample forecast results from each of the alternative Q-models (HARQ-J, CHARQ, and SHARQ), to their non-Q adjusted baseline specification. We also include the HARQ model. For both currencies, the enhancements seen in the HARQ-J and CHARQ models align with those observed in the basic HARQ model. This is in contrast to the SHARQ model, which is outperformed by SHAR. Bollerslev et al. (2016) report similar results.
Table 9.
Out-of-sample forecast losses for alternative Q-models.
EURUSD
HARQ
HARQ-J
CHARQ
SHARQ
MSE-RW
0.9759
0.9693
0.9749
1.0613
MSE-IW
0.9742
0.9563
0.9567
1.0315
QLIKE-RW
0.9767
0.9845
0.9750
1.1473
QLIKE-IW
0.9952
0.9960
0.9893
0.9987
USDJPY
HARQ
HARQ-J
CHARQ
SHARQ
MSE-RW
0.8885
0.8916
0.8914
1.0953
MSE-IW
0.9446
0.9322
0.9389
0.8965
QLIKE-RW
0.8824
0.8471
0.9040
1.3887
QLIKE-IW
0.8949
0.8942
0.9178
0.8974
Note: Model performance, expressed as model loss normalized by the loss of the relevant baseline models without the Q-adjustment terms. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold.
Recent history contains two independent events that separately have induced turbulence in the global macroeconomy and financial markets. One is the outbreak of COVID-19 in March 2020; another is the Russian invasion of Ukraine in the second half of 2022, as illustrated in Figure 1.
To analyze this period of extreme market conditions in isolation, we perform a sub-sample analysis covering 2020–2022. Table 10 contains out-of-sample results for day-ahead volatility forecasts. Reassuringly, the overall results remain intact, in that the HARQ-F model is the best performing model also when this extreme period is considered in isolation.
Table 10.
Day ahead out-of-sample forecast losses, 2020–2022 subsample.
EURUSD
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.2522
1.0000
0.9781
0.9745
1.0041
1.0425
0.9517
0.9304
MSE-IW
1.2068
1.0000
0.9813
0.9764
0.9979
1.0976
0.9806
0.9677
QLIKE-RW
1.3216
1.0000
1.0169
0.9829
1.0093
1.1370
0.9446
0.9065
QLIKE-IW
1.5585
1.0000
1.0085
1.0119
1.0059
1.2338
0.9725
0.9701
USDJPY
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.0930
1.0000
1.0555
0.9909
0.9822
0.9895
0.9564
0.9348
MSE-IW
1.1099
1.0000
0.9958
0.9850
1.0112
1.0523
1.0071
0.9827
QLIKE-RW
1.3404
1.0000
1.2635
1.0136
0.9845
0.9509
0.8611
0.8677
QLIKE-IW
1.4766
1.0000
0.9939
0.9808
1.0108
1.0231
0.8453
0.7868
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold.
This study uses updated tick-level data from two major currency pairs, EURUSD and USDJPY, covering January 2010 to December 2022, to investigate the relevance of realized quarticity for out-of-sample volatility forecasts. We find that realized quarticity effectively captures noise caused by measurement errors, as evidenced by increased precision in daily, weekly, and monthly volatility estimates from models augmented with realized quarticity as an additionally explanatory variable. These results are robust across estimation windows, evaluation metrics, and model specifications. As such, the results conform to comparable studies from other markets, predominantly on equity indices and single stocks. This paper also complements the relatively scarce body of literature on foreign exchange markets in this context.
A myriad of volatility models based on the HAR framework have been proposed. Still, simple linear HAR specifications have proven remarkably difficult to beat, as shown by Audrino et al. (2024) and Branco et al. (2024). In a recent survey, Gunnarsson et al. (2024) report promising results for machine learning models and volatility forecasting across asset classes. The FX implied volatility surface contains a rich set of relevant predictive information across forecasting horizons and quantiles (de Lange et al., 2022). Thus, combining implied volatilities and high-frequency data using machine learning models, along the lines of Blom et al. (2023), appears as an interesting avenue for future research.
Rarely, one single model dominates others in terms of statistical and economic criteria. To this end, investigating ensemble models where high-frequency models are combined with other volatility model classes, such as time series models and stochastic volatility models-possibly including jump-processes, should be of interest. The recently developed rough-path volatility models based on fractional Brownian motion (Salmon and SenGuptz, 2021; Bayer et al., 2023) appear particularly relevant in this context.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
M.H.: Data Curation, Writing - Original Draft, Writing - Review & Editing.
Acknowledgments
We would like to thank Andrew Patton for making the Matlab code from Bollerslev et al. (2016) available at https://public.econ.duke.edu/ap172/. Furthermore, we are grateful for insightful comments from the Editor and two anonymous reviewers, which helped us improve the paper.
Conflict of interest
The authors declare no conflicts of interest.
References
[1]
N. D. Rawlings, F. R. Morton, C. Y. Kok, J. Kong, A. J. Barrett, MEROPS: The peptidase database, Nucleic Acids Res., 36 (2008), 320-325. doi: 10.1093/nar/gkn292
[2]
B. Turk, Targeting proteases: Successes, failures and future prospects, Nat. Rev. Drug Discovery, 5 (2006), 785-799. doi: 10.1038/nrd2092
[3]
M. Egeblad, Z. Werb, New functions for the matrix metalloproteinases in cancer progression, Nat. Rev. Cancer, 2 (2002), 161-174. doi: 10.1038/nrc745
[4]
K. Nabeshima, T. Inoue, Y. Shimao, T. Sameshima, Matrix metalloproteinases in tumor invasion: Role for cell migration, Pathol. Int., 52 (2002), 255-264. doi: 10.1046/j.1440-1827.2002.01343.x
[5]
A. C. Newby, Matrix metalloproteinases regulate migration, proliferation, and death of vascular smooth muscle cells by degrading matrix and non-matrix substrates, Cardiovascul. Res., 69 (2006), 614-624. doi: 10.1016/j.cardiores.2005.08.002
[6]
R. Palmisano, Y. Itoh, Analysis of MMP-dependent cell migration and invasion, Methods Molecul. Biol., 622 (2010), 379-392. doi: 10.1007/978-1-60327-299-5_23
[7]
A. Page-McCaw, A. J. Ewald, Z. Werb, Matrix metalloproteinases and the regulation of tissue remodelling, Nat. Rev. Molecul. Cell Biol., 8 (2007), 221-233.
[8]
O. Julien, J. A. Wells, Caspases and their substrates, Cell Death Diff., 24 (2017), 1380-1389. doi: 10.1038/cdd.2017.44
[9]
X. L. Li, P. Wang, Y. Xie, Protease nexin-1 protects against Alzheimer's disease by regulating the sonic hedgehog signaling pathway, Int. J. Neurosci., (2020), 1-10.
[10]
M. A. Slack, S. M. Gordon, Protease activity in vascular disease, Arterioscler. Thromb. Vascul. Biol., 39 (2019), 210-218.
[11]
C. Tomuschat, A. M. O'Donnell, D. Coyle, P. Puri, Increased protease activated receptors in the colon of patients with Hirschsprung's disease, J. Pediatr. Surg., 55 (2020), 1488-1494. doi: 10.1016/j.jpedsurg.2019.11.009
[12]
L. J. Visser, G. N. Medina, H. H. Rabouw, R. J. de Groot, M. A. Langereis, T. de Los Santos, et al., Foot-and-mouth disease virus leader protease cleaves G3BP1 and G3BP2 and inhibits stress granule formation, J. Virol., 93 (2019), 922-918.
[13]
K. Ożegowska, J. Bartkowiak-Wieczorek, A. Bogacz, A. Seremak-Mrozikiewicz, A. J. Duleba, L. Pawelczyk, Relationship between adipocytokines and angiotensin converting enzyme gene insertion/deletion polymorphism in lean women with and without polycystic ovary syndrome, Gynecol. Endocrinology.: Off. J. Int. Soc. Gynecol. Endocrinol., 36 (2020), 496-500. doi: 10.1080/09513590.2019.1695248
[14]
X. S. Ren, Y. Tong, Y. Qiu, C. Ye, N. Wu, X.Q. Xiong, et al., MiR155-5p in adventitial fibroblasts-derived extracellular vesicles inhibits vascular smooth muscle cell proliferation via suppressing angiotensin-converting enzyme expression, J. Extracell. Vesicles, 9 (2020), 1698795. doi: 10.1080/20013078.2019.1698795
[15]
I. Schechter, A. Berger, On the size of the active site in proteases. I. Papain. 1967, Biochem. Biophys. Res. Commun., 425 (2012), 497-502. doi: 10.1016/j.bbrc.2012.08.015
[16]
P. Van Damme, A. Staes, S. Bronsoms, K. Helsens, N. Colaert, E. Timmerman, et al., Complementary positional proteomics for screening substrates of endo and exoproteases, Nat. Methods, 7 (2010), 512-515. doi: 10.1038/nmeth.1469
[17]
O. Schilling, O. Barré, P. F. Huesgen, C. M. Overall, Proteome-wide analysis of protein carboxy termini: C terminomics, Nat. Methods, 7 (2010), 508-511. doi: 10.1038/nmeth.1467
[18]
P. Van Damme, S. Maurer-Stroh, K. Plasman, J. Van Durme, N. Colaert, E. Timmerman, et al., Analysis of protein processing by N-terminal proteomics reveals novel species-specific substrate determinants of granzyme B orthologs, Mol. Cell. Proteomics: MCP, 8 (2009), 258-272. doi: 10.1074/mcp.M800060-MCP200
[19]
S. Mahrus, J. C. Trinidad, D. T. Barkan, A. Sali, A. L. Burlingame, J. A. Wells, Global sequencing of proteolytic cleavage sites in apoptosis by specific labeling of protein N termini, Cell, 134 (2008), 866-876. doi: 10.1016/j.cell.2008.08.012
[20]
N. D. Rawlings, A. J. Barrett, R. Finn, Twenty years of the MEROPS database of proteolytic enzymes, their substrates and inhibitors, Nucleic Acids Res., 44 (2016), 343-350. doi: 10.1093/nar/gkv1118
[21]
Y. Igarashi, A. Eroshkin, S. Gramatikova, K. Gramatikoff, Y. Zhang, J. W. Smith, et al., CutDB: A proteolytic event database, Nucleic Acids Res., 35 (2007), 546-549. doi: 10.1093/nar/gkl813
[22]
Y. Igarashi, E. Heureux, K. S. Doctor, P. Talwar, S. Gramatikova, K. Gramatikoff, et al., PMAP: Databases for analyzing proteolytic events and pathways, Nucleic Acids Res., 37 (2009), 611-618. doi: 10.1093/nar/gkn977
[23]
V. Quesada, G. R. Ordóñez, L. M. Sánchez, X. S. Puente, C. López-Otín, The Degradome database: Mammalian proteases and diseases of proteolysis, Nucleic Acids Res., 37 (2009), 239-243. doi: 10.1093/nar/gkn570
[24]
A. U. Lüthi, S. J. Martin, The CASBAH: A searchable database of caspase substrates, Cell Death Differ., 14 (2007), 641-650. doi: 10.1038/sj.cdd.4402103
[25]
K. K. Dey, D. Y. Xie, M. Stephens, A new sequence logo plot to highlight enrichment and depletion, Bmc Bioinf., 19 (2018), 1-9. doi: 10.1186/s12859-017-2006-0
[26]
G. E. Crooks, G. Hon, J. M. Chandonia, S. E. Brenner, WebLogo: A sequence logo generator, Genome Res., 14 (2004), 1188-1190. doi: 10.1101/gr.849004
[27]
N. Colaert, K. Helsens, L. Martens, J. L. Vandekerckhove, K. Gevaert, Improved visualization of protein consensus sequences by iceLogo, Nat. Methods, 6 (2009), 786-787. doi: 10.1038/nmeth1109-786
[28]
M. M. Dix, G.M. Simon, B. F. Cravatt, Global mapping of the topography and magnitude of proteolytic events in apoptosis, Cell, 134 (2008), 679-691. doi: 10.1016/j.cell.2008.06.038
[29]
J. E. Fuchs, S. von Grafenstein, R. G. Huber, M. A. Margreiter, G. M. Spitzer, H. G. Wallnoefer, et al., Cleavage entropy as quantitative measure of protease specificity, PLoS Comput. Biol., 9 (2013), 1003007. doi: 10.1371/journal.pcbi.1003007
[30]
J. E. Fuchs, S. von Grafenstein, R. G. Huber, C. Kramer, K. R. Liedl, Substrate-driven mapping of the degradome by comparison of sequence logos, PLoS Comput. Biol., 9 (2013), 1003353. doi: 10.1371/journal.pcbi.1003353
[31]
E. Qi, D. Wang, Y. Li, G. Li, Z. Su, Revealing favorable and unfavorable residues in cooperative positions in protease cleavage sites, Biochem. Biophys. Res. Commun., 519 (2019), 714-720. doi: 10.1016/j.bbrc.2019.09.056
[32]
E. F. Qi, D. Y. Wang, B. Gao, Y. Li, G. J. Li, Block-based characterization of protease specificity from substrate sequence profile, Bmc Bioinf., 18 (2017), 438. doi: 10.1186/s12859-017-1851-1
[33]
J. Song, H. Tan, A. J. Perry, T. Akutsu, G. I. Webb, J. C. Whisstock, et al., PROSPER: An integrated feature-based tool for predicting protease substrate cleavage sites, PloS one, 7 (2012), 50300. doi: 10.1371/journal.pone.0050300
[34]
J. Verspurten, K. Gevaert, W. Declercq, P. Vandenabeele, SitePredicting the cleavage of proteinase substrates, Trends Biochem. Sci., 34 (2009), 319-323. doi: 10.1016/j.tibs.2009.04.001
[35]
Z. Zhang, S. Schwartz, L. Wagner, W. Miller, A greedy algorithm for aligning DNA sequences, J. Comput. Biol.: J. Comput. Mol. Cell Biol., 7 (2000), 203-214. doi: 10.1089/10665270050081478
[36]
C. Spearman, The proof and measurement of association between two things, Am. J. Psychol., 100 (1987), 441-471. doi: 10.2307/1422689
[37]
I. Letunic, P. Bork, Interactive Tree Of Life v2: Online annotation and display of phylogenetic trees made easy, Nucleic Acids Res., 39 (2011), 475-478. doi: 10.1093/nar/gkq818
[38]
N. M. Ng, R. N. Pike, S. E. Boyd, Subsite cooperativity in protease specificity, Biol. Chem., 390 (2009), 401-407. doi: 10.1515/BC.2009.065
[39]
H. R. Stennicke, M. RENATUS, M. MELDAL, G. S. SALVESEN, Internally quenched fluorescent peptide substrates disclose the subsite preferences of human caspases 1, 3, 6, 7 and 8, Biochem. J., 350 (2000), 563-568. doi: 10.1042/bj3500563
[40]
Y. Choe, F. Leonetti, D. C. Greenbaum, F. Lecaille, M. Bogyo, D. Brömme, et al., Substrate profiling of cysteine proteases using a combinatorial peptide library identifies functionally unique specificities, J. Biol. Chem., 281 (2006), 12824-12832. doi: 10.1074/jbc.M513331200
[41]
S. Elamouri, H. Zhu, J. Yu, R. A. Marr, I. M. Verma, M. S. Kindy, Neprilysin: An enzyme candidate to slow the progression of Alzheimer's disease, Am. J. Pathol., 172 (2008), 1342-1354. doi: 10.2353/ajpath.2008.070620
[42]
M. Eguiluz, F. Kulcheski, R. Margis, F. Guzman, De novo assembly of vriesea carinata leaf transcriptome to identify candidate cysteine-proteases, Gene, 691 (2019), 96-105. doi: 10.1016/j.gene.2018.12.053
The table contains summary statistics for the daily RVs for EURUSD and USDJPY. ρ1 is the standard first order autocorrelation coefficient. Sample period: 1. January 2010 to 31. December 2022.
Note: The table contains in-sample parameter estimates and corresponding standard errors (White, 1980), together with R2. MSE and QLIKE computed from (12) and (13). Superscripts *, **, and *** represent statistical significance in a two-sided t-test at 1%, 5% and 10% levels, respectively.
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold. Corresponding asterix * and ** denote 1% and 5% confidence levels from Diebold-Mariano test for one-sided tests of superior performance of the best performing model compared to the HAR model.
Note: The table segments the results in Table 3 according to RQ. The bottom panel shows the ratios for days following a value of RQ in the top 5%. The top panel shows the results for the remaining 95% of sample. Ratio for the best performing model on each row in bold.
Table 5.
In-sample weekly and monthly model estimates.
(a) EURUSD
Weekly
Monthly
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
β0
0.8646∗
0.2634∗
0.5680∗
0.4758∗
-0.0250
0.2275∗∗
1.6388∗
0.9642∗
0.9269∗
0.8452∗
0.2328
0.2153
s.e.
0.1345
0.0927
0.0997
0.0882
0.0895
0.0861
0.1806
0.1840
0.2246
0.2099
0.2080
0.2093
β1
0.7168∗
0.9620∗
0.1194∗
0.2752∗
0.1836∗
0.1181∗
0.4616∗
0.7373∗
0.0717∗
0.2097∗
0.1131∗
0.0646∗
s.e.
0.0480
0.0400
0.0264
0.0395
0.0269
0.0214
0.0564
0.0616
0.0205
0.0401
0.0248
0.0185
β2
0.3938∗
0.3395∗
0.5777∗
0.7635∗
0.2091∗
0.1606∗
0.3706∗
0.2176∗
s.e.
0.0887
0.0881
0.1282
0.1139
0.0587
0.0554
0.0962
0.0563
β3
0.3008∗
0.2440∗
0.3131∗∗
0.0876
0.4163∗
0.3661∗
0.5153∗
0.7179∗
s.e.
0.0880
0.0817
0.1275
0.0940
0.1186
0.1174
0.1498
0.1106
β1Q
−5.4876∗
−1.0749∗
−0.4728∗
−6.1534∗
−0.9499∗
−0.3246∗
s.e.
0.4817
0.1377
0.1005
0.9900
0.1815
0.0846
β2Q
−2.7357∗
−4.9739∗
−2.3111∗
s.e.
0.9302
0.7181
0.8020
β3Q
−5.6441∗
−7.8467∗
−10.9979∗
s.e.
1.4540
2.1071
1.9082
R2
0.5138
0.5642
0.5453
0.5604
0.5843
0.5756
0.4297
0.5191
0.5072
0.5237
0.5678
0.5568
MSE
2.6073
2.3370
2.4385
2.3576
2.2292
2.2759
2.1913
1.8477$
1.8932
1.8299
1.6606
1.7027
QLIKE
0.0862
0.0731
0.0752
0.0735
0.0679
0.0704
0.1073
0.0804
0.0839
0.0012
0.0760
0.0788
(b) USDJPY
Weekly
Monthly
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
β0
2.0305∗
1.1976*
1.3310∗
1.1591∗
0.8708∗
0.9646∗
2.5786∗
2.2815∗
1.7358∗
1.6356∗
1.3894∗
1.3900∗
s.e.
0.2484
0.1550
0.1967
0.1646
0.1701
0.1564
0.1928
0.2245
0.2792
0.2678
0.3106
0.3151
β1
0.3709∗
0.6801∗
0.0687∗
0.2722∗
0.1650∗
0.0668∗
0.2011*
0.3121∗
0.0286∗∗
0.1460∗
0.0829∗
0.0283∗
s.e.
0.0717
0.0512
0.0266
0.0500
0.0373
0.0207
0.0363
0.0566
0.0119
0.0258
0.0166
0.0113
β2
0.1294
0.0742
0.3558∗
0.4971∗
0.0865∗
0.0541
0.1886*
0.0923∗
s.e.
0.0700
0.0609
0.0787
0.0790
0.0389
0.0333
0.0487
0.0376
β3
0.3910∗
0.3147∗
0.2622∗
0.1829*
0.3460∗
0.3030∗
0.3340∗
0.4811∗
s.e.
0.0703
0.0621
0.0959
0.0693
0.0916
0.0883
0.1346
0.1220
β1Q
−0.6085∗
−0.1190∗
−0.0571∗
−0.2167∗∗
−0.0678∗
−0.0318∗
s.e.
0.0534
0.0173
0.0141
0.0832
0.0093
0.0068
β2Q
−0.3653∗
−0.5357∗
−0.1659∗
s.e.
0.0704
0.0648
0.0465
β3Q
-0.2750
-0.3946
−0.7392∗∗
s.e.
0.2010
0.2942
0.2900
R2
0.1367
0.2323
0.1848
0.2270
0.2557
0.2475
0.1414
0.2106
.2205
0.2496
0.2761
0.2542
MSE
11.6923
$ 10.3980
11.0412
10.4701
10.0811
10.1919
5.4365
4.9983
4.9351
4.7513
4.58326
4.7220
QLIKE
0.2361
0.4197
0.2057
0.1937
0.4076
0.1405
0.2143
0.1973
0.1801
0.1734
0.1634
0.1680
Note: In-sample parameter estimates for weekly (h=5) and monthly (h=22) forecasting models. EURUSD in upper panel (Table 5a) and USDJPY in lower panel (Table 5b). Robust standard errors (s.e.) using Newey and West (1987) accommodate autocorrelation up to order 10 (h=5), and 44 (h=22), respectively. Superscripts *, ** and *** represent statistical significance in a two-sided t-test at 1%, 5%, and 10% levels.
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best-performing model on each row in bold. Corresponding asterix * and ** denote 1% and 5% confidence levels from Diebold-Mariano test for one-sided tests of superior performance of the best performing model compared to the HAR model.
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold. Corresponding asterix * and ** denote 1% and 5% confidence levels from Diebold-Mariano test for one-sided tests of superior performance of the best performing model compared to the HAR model.
Note: Model performance, expressed as model loss normalized by the loss of the HARQ model, relies on RQ1/2. Each row reflects a combination of estimation window and loss function. Ratio for the best, performing model on each row in bold. The left panel reports the results based on alternative RQ interaction terms. The right panel reports the results from including RQ1/2 as an explanatory variable.
Table 9.
Out-of-sample forecast losses for alternative Q-models.
EURUSD
HARQ
HARQ-J
CHARQ
SHARQ
MSE-RW
0.9759
0.9693
0.9749
1.0613
MSE-IW
0.9742
0.9563
0.9567
1.0315
QLIKE-RW
0.9767
0.9845
0.9750
1.1473
QLIKE-IW
0.9952
0.9960
0.9893
0.9987
USDJPY
HARQ
HARQ-J
CHARQ
SHARQ
MSE-RW
0.8885
0.8916
0.8914
1.0953
MSE-IW
0.9446
0.9322
0.9389
0.8965
QLIKE-RW
0.8824
0.8471
0.9040
1.3887
QLIKE-IW
0.8949
0.8942
0.9178
0.8974
Note: Model performance, expressed as model loss normalized by the loss of the relevant baseline models without the Q-adjustment terms. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold.
Table 10.
Day ahead out-of-sample forecast losses, 2020–2022 subsample.
EURUSD
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.2522
1.0000
0.9781
0.9745
1.0041
1.0425
0.9517
0.9304
MSE-IW
1.2068
1.0000
0.9813
0.9764
0.9979
1.0976
0.9806
0.9677
QLIKE-RW
1.3216
1.0000
1.0169
0.9829
1.0093
1.1370
0.9446
0.9065
QLIKE-IW
1.5585
1.0000
1.0085
1.0119
1.0059
1.2338
0.9725
0.9701
USDJPY
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.0930
1.0000
1.0555
0.9909
0.9822
0.9895
0.9564
0.9348
MSE-IW
1.1099
1.0000
0.9958
0.9850
1.0112
1.0523
1.0071
0.9827
QLIKE-RW
1.3404
1.0000
1.2635
1.0136
0.9845
0.9509
0.8611
0.8677
QLIKE-IW
1.4766
1.0000
0.9939
0.9808
1.0108
1.0231
0.8453
0.7868
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold.
The table contains summary statistics for the daily RVs for EURUSD and USDJPY. ρ1 is the standard first order autocorrelation coefficient. Sample period: 1. January 2010 to 31. December 2022.
EURUSD
AR
HAR
ARQ
HARQ
HARQ-F
β0
1.3663∗
0.3961∗
0.7428∗
0.2785∗
−0.0651
s.e.
0.1843
0.0598
0.0969
0.0586
0.0685
β1
0.5530∗
0.2364∗
0.7903∗
0.4349∗
0.3740∗
s.e.
0.0653
0.0730
0.0388
0.0754
0.0792
β2
0.3767∗
0.3072∗
0.4613∗
s.e.
0.0717
0.0697
0.1031
β3
0.2572∗
0.1850∗
0.2398∗
s.e.
0.0532
0.0515
0.0822
β1Q
−2.4914∗
−1.3708∗
−0.9710∗
s.e.
0.3377
0.1939
0.2266
β2Q
−1.7578∗∗∗
s.e.
0.8706
β3Q
−3.9819∗
s.e.
1.1618
R2
0.3058
0.3956
0.3685
0.4101
0.4166
MSE
6.3005
5.4852
5.7315
5.3538
5.2950
QLIKE
0.1647
0.1230
0.1540
0.1217
0.1199
USDJPY
AR
HAR
ARQ
HARQ
HARQ-F
β0
2.3073∗
1.0682∗
1.3537∗
0.7811∗
0.5218∗
s.e.
0.2362
0.1381
0.2207
0.1429
0.1328
β1
0.2854∗
0.1819∗′
0.6180∗
0.5177∗∗∗
0.4416∗
s.e.
0.0804
0.0806
0.0853
0.1106
0.1260
β2
0.1441∗∗
0.0542
0.2345∗∗∗
s.e.
0.0585
0.0543
0.1072
β3
0.3443∗
0.2188∗
0.2228∗
s.e.
0.0499
0.0493
0.0658
β1Q
−0.2295∗
−0.1967∗
−0.1526∗
s.e.
0.0318
0.0386
0.0476
β2Q
−0.2296∗∗
s.e.
0.0849
β3Q
−0.3573∗
s.e.
0.1142
R2
0.0814
0.1154
0.1489
0.1581
0.1642
MSE
33.6096
32.3668
31.1409
30.8063
30.5818
QLIKE
0.3214
0.2561
0.2663
0.2377
0.2242
Note: The table contains in-sample parameter estimates and corresponding standard errors (White, 1980), together with R2. MSE and QLIKE computed from (12) and (13). Superscripts *, **, and *** represent statistical significance in a two-sided t-test at 1%, 5% and 10% levels, respectively.
EURUSD
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.1483
1.0000
1.0088
0.9945
1.0080
1.0311
0.9759
0.9655*
MSE-EW
1.1619
1.0000
0.9984
0.9908
1.0050
1.02660
0.9742
0.9720*
QLIKE-RW
1.3153
1.0000
0.9907
0.9813
1.0078
1.1575
0.9767
0.9582*
QLIKE-EW
1.3915
1.0000
0.9907
0.9944
1.0052
1.1927
0.9952
0.9721**
USDJPY
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.0502
1.0000
1.0053
0.9979
1.0238
0.8907
0.8885
0.8832*
MSE-EW
1.0475
1.0000
1.0243
1.0133
1.0515
0.9558
0.9446
0.9376*
QLIKE-RW
1.2320
1.0000
1.0748
0.9944
0.9811
0.9482
0.8824
0.8667*
QLIKE-EW
1.3066
1.0000
1.0023
0.9800
0.9941
1.0039
0.8949
0.8519*
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold. Corresponding asterix * and ** denote 1% and 5% confidence levels from Diebold-Mariano test for one-sided tests of superior performance of the best performing model compared to the HAR model.
(a) Bottom 95% RQ
EURUSD
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.1156
1.0000
0.9937
0.9907
1.0021
1.0636
0.9925
0.9794
MSE-IW
1.1175
1.0000
0.9887
0.9885
1.0020
1.0711
0.9967
0.9866
QLIKE-RW
1.3299
1.0000
0.9975
0.9855
1.0071
1.1598
0.9745
0.9555
QLIKE-IW
1.4108
1.0000
0.9956
0.9980
1.0055
1.1995
0.9944
0.9720
heightUSDJPY
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.0330
1.0000
1.0146
0.9984
0.9940
0.9592
0.9526
0.9495
MSE-IW
1.0590
1.0000
0.9962
0.9925
1.0001
0.9849
0.9681
0.9601
QLIKE-RW
1.2507
1.0000
1.1353
0.9877
0.9829
0.9542
0.8797
0.8450
QLIKE-IW
1.3266
1.0000
0.9883
0.9734
0.9993
1.0100
0.8887
0.8434
(b) Top 5% RQ
EURUSD
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.2276
1.0000
1.0453
1.0036
1.0225
0.9523
0.9355
0.9316
MSE-IW
1.2642
1.0000
1.0206
0.9960
1.0121
0.9218
0.9224
0.9382
QLIKE-RW
1.0876
1.0000
0.8851
0.9152
1.0186
1.1223
1.0116
0.9996
QLIKE-IW
1.0902
1.0000
0.9141
0.9389
1.0006
1.0856
1.0081
0.9745
USDJPY
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.0674
1.0000
1.0025
0.9974
1.0535
0.9425
0.8700
0.8518
MSE-IW
1.0347
1.0000
1.0566
1.0365
1.1090
0.9246
0.9183
0.9126
QLIKE-RW
1.0202
1.0000
1.5755
1.0697
0.9601
0.8803
0.9135
0.9999
QLIKE-IW
1.0544
1.0000
1.1789
1.0628
0.9279
0.9278
0.9730
0.9588
Note: The table segments the results in Table 3 according to RQ. The bottom panel shows the ratios for days following a value of RQ in the top 5%. The top panel shows the results for the remaining 95% of sample. Ratio for the best performing model on each row in bold.
(a) EURUSD
Weekly
Monthly
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
β0
0.8646∗
0.2634∗
0.5680∗
0.4758∗
-0.0250
0.2275∗∗
1.6388∗
0.9642∗
0.9269∗
0.8452∗
0.2328
0.2153
s.e.
0.1345
0.0927
0.0997
0.0882
0.0895
0.0861
0.1806
0.1840
0.2246
0.2099
0.2080
0.2093
β1
0.7168∗
0.9620∗
0.1194∗
0.2752∗
0.1836∗
0.1181∗
0.4616∗
0.7373∗
0.0717∗
0.2097∗
0.1131∗
0.0646∗
s.e.
0.0480
0.0400
0.0264
0.0395
0.0269
0.0214
0.0564
0.0616
0.0205
0.0401
0.0248
0.0185
β2
0.3938∗
0.3395∗
0.5777∗
0.7635∗
0.2091∗
0.1606∗
0.3706∗
0.2176∗
s.e.
0.0887
0.0881
0.1282
0.1139
0.0587
0.0554
0.0962
0.0563
β3
0.3008∗
0.2440∗
0.3131∗∗
0.0876
0.4163∗
0.3661∗
0.5153∗
0.7179∗
s.e.
0.0880
0.0817
0.1275
0.0940
0.1186
0.1174
0.1498
0.1106
β1Q
−5.4876∗
−1.0749∗
−0.4728∗
−6.1534∗
−0.9499∗
−0.3246∗
s.e.
0.4817
0.1377
0.1005
0.9900
0.1815
0.0846
β2Q
−2.7357∗
−4.9739∗
−2.3111∗
s.e.
0.9302
0.7181
0.8020
β3Q
−5.6441∗
−7.8467∗
−10.9979∗
s.e.
1.4540
2.1071
1.9082
R2
0.5138
0.5642
0.5453
0.5604
0.5843
0.5756
0.4297
0.5191
0.5072
0.5237
0.5678
0.5568
MSE
2.6073
2.3370
2.4385
2.3576
2.2292
2.2759
2.1913
1.8477$
1.8932
1.8299
1.6606
1.7027
QLIKE
0.0862
0.0731
0.0752
0.0735
0.0679
0.0704
0.1073
0.0804
0.0839
0.0012
0.0760
0.0788
(b) USDJPY
Weekly
Monthly
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
AR
ARQ
HAR
HARQ
HARQ-F
HARQ-h
β0
2.0305∗
1.1976*
1.3310∗
1.1591∗
0.8708∗
0.9646∗
2.5786∗
2.2815∗
1.7358∗
1.6356∗
1.3894∗
1.3900∗
s.e.
0.2484
0.1550
0.1967
0.1646
0.1701
0.1564
0.1928
0.2245
0.2792
0.2678
0.3106
0.3151
β1
0.3709∗
0.6801∗
0.0687∗
0.2722∗
0.1650∗
0.0668∗
0.2011*
0.3121∗
0.0286∗∗
0.1460∗
0.0829∗
0.0283∗
s.e.
0.0717
0.0512
0.0266
0.0500
0.0373
0.0207
0.0363
0.0566
0.0119
0.0258
0.0166
0.0113
β2
0.1294
0.0742
0.3558∗
0.4971∗
0.0865∗
0.0541
0.1886*
0.0923∗
s.e.
0.0700
0.0609
0.0787
0.0790
0.0389
0.0333
0.0487
0.0376
β3
0.3910∗
0.3147∗
0.2622∗
0.1829*
0.3460∗
0.3030∗
0.3340∗
0.4811∗
s.e.
0.0703
0.0621
0.0959
0.0693
0.0916
0.0883
0.1346
0.1220
β1Q
−0.6085∗
−0.1190∗
−0.0571∗
−0.2167∗∗
−0.0678∗
−0.0318∗
s.e.
0.0534
0.0173
0.0141
0.0832
0.0093
0.0068
β2Q
−0.3653∗
−0.5357∗
−0.1659∗
s.e.
0.0704
0.0648
0.0465
β3Q
-0.2750
-0.3946
−0.7392∗∗
s.e.
0.2010
0.2942
0.2900
R2
0.1367
0.2323
0.1848
0.2270
0.2557
0.2475
0.1414
0.2106
.2205
0.2496
0.2761
0.2542
MSE
11.6923
$ 10.3980
11.0412
10.4701
10.0811
10.1919
5.4365
4.9983
4.9351
4.7513
4.58326
4.7220
QLIKE
0.2361
0.4197
0.2057
0.1937
0.4076
0.1405
0.2143
0.1973
0.1801
0.1734
0.1634
0.1680
Note: In-sample parameter estimates for weekly (h=5) and monthly (h=22) forecasting models. EURUSD in upper panel (Table 5a) and USDJPY in lower panel (Table 5b). Robust standard errors (s.e.) using Newey and West (1987) accommodate autocorrelation up to order 10 (h=5), and 44 (h=22), respectively. Superscripts *, ** and *** represent statistical significance in a two-sided t-test at 1%, 5%, and 10% levels.
EURUSD
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
HARQ-h
MSE-RW
1.3063
1.0000
0.9636
0.9884
1.0017
1.1459
0.9677
0.9024*
0.9205
MSE-EW
1.2702
1.0000
0.9433
0.9559
0.9997
1.1288
0.9501
0.8996*
0.9117
QLIKE-RW
1.5923
1.0000
0.9819
0.9840
0.9995
1.3558
0.9932
0.8701
0.9283
QLIKE-EW
1.7682
1.0000
0.9874
1.0031
1.0033
1.4134
0.9648
0.8832*
0.9297
USDJPY
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
HARQ-h
MSE-RW
1.0618
1.0000
0.9464
0.9509
0.9965
0.9064
0.8971
0.8393*
0.8443
MSE-EW
1.1707
1.0000
1.0148
1.0021
1.0336
1.0194
0.9388
0.8993
0.8976*
QLIKE-RW
1.3119
1.0000
1.0057
0.9910
0.9740
1.0493
0.9099
0.8246*
0.8359
QLIKE-EW
1.3847
1.0000
0.9918
0.9768
1.0002
1.1391
0.9179
0.8350*
0.8463
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best-performing model on each row in bold. Corresponding asterix * and ** denote 1% and 5% confidence levels from Diebold-Mariano test for one-sided tests of superior performance of the best performing model compared to the HAR model.
EURUSD
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
HARQ-h
MSE-RW
1.3289
1.0000
0.9876
0.9952
1.0003
1.1876
0.9625
0.8803*
0.9004
MSE-IW
1.3265
1.0000
0.9759
1.0010
1.0044
1.1707
0.9537
0.8723
0.9070
QLIKE-RW
1.4301
1.0000
0.9945
0.9950
0.9982
1.2380
0.9622
0.9215*
0.9279
QLIKE-IW
1.5155
1.0000
0.9951
1.0051
1.0011
1.2596
0.9599
0.9333
0.9784
USDJPY
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
HARQ-h
MSE-RW
1.2529
1.0000
1.0215
1.0086
0.9893
1.5820
1.0500
1.0070
0.9621*
MSE-IW
1.2547
1.0000
1.0073
1.0029
1.0119
1.1181
0.9620
0.9495*
0.9780
QLIKE-RW
1.1937
1.0000
1.0023
0.9963
0.9893
1.0313
0.9307
0.9454
1.0318
QLIKE-IW
1.2894
1.0000
0.9959
0.9909
1.0000
1.1453
0.9452
0.8932*
1.0143
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold. Corresponding asterix * and ** denote 1% and 5% confidence levels from Diebold-Mariano test for one-sided tests of superior performance of the best performing model compared to the HAR model.
Alternative RQ transformations
Adding RQ1/2
EURUSD
RQ
RQ1/2
RQ−1/2
RQ−1
log(RQ)
HAR
HARQ
MSE-RW
1.0023
1.0000
1.0263
1.0246
1.0092
1.0309
1.0052
MSE-IW
1.0016
1.0000
1.0274
1.0265
1.0069
1.0292
1.0086
QLIKE-RW
1.0042
1.0000
1.0326
1.0304
1.0007
1.0250
1.0067
QLIKE-IW
1.0014
1.0000
1.0064
1.0254
0.9937
1.0044
1.0164
USDJPY
RQ
RQ1/2
RQ−1/2
RQ−1
log(RQ)
HAR
HARQ
MSE-RW
1.0001
1.0000
1.1345
1.1225
1.0516
1.1202
1.0118
MSE-IW
1.0049
1.0000
1.0606
1.0543
0.9931
1.0512
1.0186
QLIKE-RW
1.0097
1.0000
1.1439
1.1067
0.9794
1.0731
1.0455
QLIKE-IW
1.0188
1.0000
1.1105
1.0841
0.9322
1.0358
0.9989
Note: Model performance, expressed as model loss normalized by the loss of the HARQ model, relies on RQ1/2. Each row reflects a combination of estimation window and loss function. Ratio for the best, performing model on each row in bold. The left panel reports the results based on alternative RQ interaction terms. The right panel reports the results from including RQ1/2 as an explanatory variable.
EURUSD
HARQ
HARQ-J
CHARQ
SHARQ
MSE-RW
0.9759
0.9693
0.9749
1.0613
MSE-IW
0.9742
0.9563
0.9567
1.0315
QLIKE-RW
0.9767
0.9845
0.9750
1.1473
QLIKE-IW
0.9952
0.9960
0.9893
0.9987
USDJPY
HARQ
HARQ-J
CHARQ
SHARQ
MSE-RW
0.8885
0.8916
0.8914
1.0953
MSE-IW
0.9446
0.9322
0.9389
0.8965
QLIKE-RW
0.8824
0.8471
0.9040
1.3887
QLIKE-IW
0.8949
0.8942
0.9178
0.8974
Note: Model performance, expressed as model loss normalized by the loss of the relevant baseline models without the Q-adjustment terms. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold.
EURUSD
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.2522
1.0000
0.9781
0.9745
1.0041
1.0425
0.9517
0.9304
MSE-IW
1.2068
1.0000
0.9813
0.9764
0.9979
1.0976
0.9806
0.9677
QLIKE-RW
1.3216
1.0000
1.0169
0.9829
1.0093
1.1370
0.9446
0.9065
QLIKE-IW
1.5585
1.0000
1.0085
1.0119
1.0059
1.2338
0.9725
0.9701
USDJPY
AR
HAR
HAR-J
CHAR
SHAR
ARQ
HARQ
HARQ-F
MSE-RW
1.0930
1.0000
1.0555
0.9909
0.9822
0.9895
0.9564
0.9348
MSE-IW
1.1099
1.0000
0.9958
0.9850
1.0112
1.0523
1.0071
0.9827
QLIKE-RW
1.3404
1.0000
1.2635
1.0136
0.9845
0.9509
0.8611
0.8677
QLIKE-IW
1.4766
1.0000
0.9939
0.9808
1.0108
1.0231
0.8453
0.7868
Note: Model performance, expressed as model loss normalized by the loss of the HAR model. Each row reflects a combination of estimation window and loss function. Ratio for the best performing model on each row in bold.
Figure 1. Amino acid frequency maps of serine proteases over their substrate sites
Figure 2. Amino acid frequency maps of metalloproteases over their substrate sites
Figure 3. Amino acid frequency maps of cysteine proteases over their substrate sites
Figure 4. Amino acid frequency maps of aspartic proteases over their substrate sites
Figure 5. Exemplary distance values depicting four proteases. The dots in the graph represent proteases and the values above the edge, connecting the two dots, represent the distance value between the corresponding two proteases
Figure 6. Schematic depicting the phylogenetic tree of proteases over the P4–P4' substrate sequences. Four colors distinguish the analyzed proteases according to their catalytic type: serine proteases are colored in orange, aspartic proteases in blue, cysteine proteases in purple, and metalloproteases in yellow
Figure 7. Heatmap and clustering analysis of proteases over the P4–P4' substrate sequences. Color bar from red to green represents the order of similarity from high to low
Figure 8. Schematic depicting the phylogenetic tree of proteases over the P1 substrate sequences. Proteases are colored according to their catalytic type: serine proteases are colored in orange, aspartic proteases in blue, cysteine proteases in purple, and metalloproteases in yellow