1.
Introduction
The Value at Risk (VaR) system falls into three major classes: The parametric method introduced by the Morgan Risk-Metric system, the non-parametric method based on historical simulation, and the semi-parametric method based on Extreme Value Theory (EVT) tail distribution. These methods are based on the good approximation of probability distributions extracted from an asset market price. The regulatory capital requirement for market risk is typically determined using the VaR for aggregate trading portfolio over a ten-day horizon and with a 99% confidence interval, and by the performance of the banks VaR models in backtesting exercises (Zumbach, 2007). Backtest is the best way to check the risk model performance. In backtesting, the estimated VaR is compared with the actual return over the same period. The VaR exceedance occurs when the return is more negative than the VaR. In order to backtest the accuracy for the estimated VaRs, we compute the empirical failure rates. By definition, the failure rate is the number of times returns exceed the forecasted VaR. If the model is correctly specified, the failure rate should be equal to the specified VaR level. In this paper, the backtesting VaR relies on the Christoffersen (2006) and Kupiec (1995) proposed system. Dumitrescu et al. (2012) have also proposed a backtest based on the non-linear regression model. Backtesting is a formal statistical framework that consists of verifying if actual trading losses are in line with model generated VaR forecasts and relies on testing over VaR violations. A violation is said to occur when the realized trading loss exceeds the VaR forecast.
The Risk Metrics method to compute VaR is set out in Zumbach (2006) and Jorge and Jerry (2001). The original Risk Metrics method is underpinned by the assumption that daily asset returns are conditional, Gaussian independently and identically distributed (iid) random variables with a mean of zero. Under this assumption, VaR at the one-day horizon can be computed by multiplying the relevant quantile of the standard normal distribution by the one day ahead forecast of the conditional standard deviation of the portfolio return and multiplying the result by the market-to-market value of the portfolio. A convenient consequence of the iid assumption is that VaR for longer horizons can be computed by multiplying the daily VaR by the square root of the time horizon on days. Other methods to compute VaR are the historical simulation method (HS) where relevant quantile from the empirical distribution of simulated returns is applied (see Jorion, 2007 for details on both methods). Emenogu et al. (2020) discovered that the persistence of the GARCH models is robust, with the exception of a few cases where IGARCH and EGARCH were unstable. The SGARCH and GJRGARCH models also failed to converge for t-student innovation, and the mean reverting number of days for returns varied between models. Altun (2020) also found that GARCH models listed under the TSLx innovation distribution produce more accurate VaR forecasts than other competing models. Slim et al. (2017) claimed that in developed markets, the related models show signs of long memories, suggesting that the FIGARCH model is preferable to the GARCH and GJR models. In frontier and emerging markets, the GJR and GARCH are the most important specifications for capturing risk. This means that when analyzing frontier markets, risk managers should favor models that account for asymmetry.
In practice, the Risk Metrics and HS methods to computing VaR have been the most popular methods within the industry. Note, however, that the HS method is highly restrictive since it ignores any additional explanatory information that might be thought to have an impact on the relevant return quantile (such as financial or macroeconomics information). This information can be incorporated into the VaR computed using the Risk Metrics method but only through the conditional mean or the conditional variance of return. The other weakness of the original Risk Metrics method is that it assumes normality of the returns. In practice, empirical evidence suggests that for many financial prices, the conditional distributions is fat-tailed.
VaR can also be computed using quantile regression (QR) (Jorge and Jerry, 2001) (see for example (Taylor 1999, 2018)). When QR is used, the relevant return quantile can be modelled as a function of contemporaneous or lagged explanatory variables. In addition to this, QR based methods to VaR are more flexible than the Risk Metrics method since they do not require returns to be conditionally Gaussian. However, QR has several drawbacks. For example, the linearity assumption is highly questionable when data is heteroscedastic, (see Kupiec 1995, Peracchi 2002) and while nonlinear QR estimation techniques have been proposed, the asymptotic theory is not well developed. Furthermore, with both linear and nonlinear QR, the presence of heteroscedasticity can lead to estimated quantile that cross each other, and in both cases, robust inference typically requires bootstrapping, which increases the computational cost of the method. Tabasi et al. (2019) used GARCH models to model the volatility-clustering feature and found that using the t-student distribution function instead of the Normal distribution function improved model parameter estimation. Nieto and Ruiz (2016) compared the forecasting potential of various GARCH-based VaR models to their alternatives in an updated report. Surprisingly, they found that forecasting outcomes are affected by the number of out-of-sample observations as well as the time span being studied. They concluded that no single model outperforms another. Furthermore, only the asymmetric EGARCH-based model with skewed Student's-t distribution can be approved under the various model tests. Thavaneswaran et al. (2020) have introduced a volatility estimator applying an estimating function approach.
I propose an alternative parametric method to computing VaR. This alternative method allows the practitioner to utilize additional explanatory information to forecast the relevant quantile. However, relative to QR-based methods, the method proposed here has the significant practical advantage that the conditional maximum likelihood (ML) technique can be used for parameter estimation and robust inference.
The proposed method exploits the inverse relationship between the conditional quantile function (QF) and the conditional cumulative distribution (CDF), utilizing a technique for estimating the conditional CDF developed by Foresi and Peracchi (1995). I assume the market-to-market value of the relevant portfolio is one; hence, the daily VaR depends only on a one day ahead forecast of the relevant quantile for portfolio returns. Rather than directly forecast the VaR quantile, here it is proposed that the forecast is obtained indirectly using binary response models to compute probability forecasts over a grid of candidate quantile values. The candidate quantile value with an associated probability forecast closest to the desired probability (e.g. p = 0.01 for VaR at the 99% confidence level) is used as the VaR. This method is equivalent to forecasting points on the left tail of the conditional CDF and then inverting at the required VaR probability (Jelito and Pitera 2021).
Binary response models have previously been shown to be useful for forecasting the direction of asset returns. For example, Christoffersen and Diebold (2006) use a Logit model that conditions volatility as a predictor to forecast the direction of a time series index. Using binary response models to estimate points on the conditional CDF for stock returns has not previously been used for computing VaR. For brevity, we will refer to the method proposed here as the BRV (Binary Response VaR method). Ugurlu (2023) has also proposed a coherent multivariate average Var to quantify the total risk.
I compare the empirical performance of the BRV method with the orthodox Risk Metrics and HS method and a QR method in Monte Carlo simulations and an empirical application. The empirical application involves recursively computing daily VaR for two stock market indices, Dow Jones Industrial Average (INDEXDJX: DJI), and Dow Jones U.S. Marine Transportation Index (DJUSMT) over a three-year period 02/01/06–31/12/18 using the previous five years of daily data for parameter estimation at each day. The results are analyzed using tests for correct unconditional coverage and independence of the VaR exceedances. Computing VaR over this period is challenging as the period begins with benign market conditions and ends with extremely volatile conditions associated with the global financial crisis that began in 2007. I found that the BRV method and QR method clearly dominate the Risk Metrics and HS method over this period. In particular, it appears that the Risk Metrics and HS method consistently underestimate the population VaR over this period since the proportion of VaR exceedances is too large for the given confidence levels. In contrast, exceedances when the BRV method is used are much closer to the expected number. Underestimating the population VaR can lead to serious penalties for banks operating in countries where the Basel ll Capital Accord has been implemented. Hence, this result is of practical importance.
2.
Materials and methods
2.1. BRV method
Define the probability of the log portfolio returns Rt exceeding a threshold ri conditional on a known k×1 vector of predictors Xt−1 as,
where −∞<ri<∞(i=1,2….,N)and Yi,t is a binary indicator,
Note that: Pi,t can be interpreted as the value of the conditional CDF for Rtevaluated at ri.
The VaR return quantile at the confidence level (1 – p) ×100%, 0 < p < 1, is the value ri.
Such that Pt(Ri≤ri|Xt−1)=p. Let this be denoted by Qi(p).In practice the aim when computing VaR is to forecast the future value QT+h(p). Throughout this paper we focus on computing daily VaR, so T denotes the current day and h = 1.
The BRV method to forecasting QT+1(p). proposed here has three clear steps:
(ⅰ) Estimate multiple binary response models with the binary indicator (2) as the dependent variable over a range of candidate VaR return quantile, ri, using conventional ML (the form of the link function in the binary response model and the location and number of values for ri will be discussed below);
(ⅱ) project the estimated binary response models forward to compute a one step ahead forecast of the probability of exceeding
(ⅲ) as a forecast of the VaR quantile QT+1(p) use the threshold ri from (ⅰ) that minimizes the distance between the probability forecast ˆpi,T+1 and the desired VaR probability p,
The use of binary response models to estimate the conditional CDF for stock returns was proposed by Foresi and Peracchi (1995). Let −∞<r1<r2<⋯<rN<∞ be N feasible values of the return over the conditional CDF for Rt. Foresi and Peracchi (1995) show that points on the conditional CDF correspondence to ri(i = 1, 2, ..., N) can be estimated using a functional form that best approximates the population conditional CDF, Ft,as the link function in separate binary response models with the binary indicator (2) as the dependent variable. The "best approximation" is formalized as the approximation that minimizes the Kullback–Leibler divergence. Under weak regularity conditions, the parameters of the binary respons model can be consistently estimated by ML giving the estimated point ˆFi,t.
Clearly, the conditional CDF should satisfy the standard condition,
An attractive approach of the Foresi and Peracchi (1995) technique is that since it involves modelling the log-odds in [Fi,t/(1−Fi,t)]rather than Fi,t directly, (5) is automatically satisfied. Foresi and Peracchi (1995) use a semi-parametric Logit model in their empirical work. In principle, any twice-differentiable CDF can be used. Note that monotonicity of the estimated CDF,
will not necessarily be satisfied if the ML estimation is unrestricted. Whether monotonicity is satisfied or not depends on numerous factors, including the sample size and the spread of the ri values. In practice, even if monotonicity is violated, this might not have a serious detrimental impact on the practical performance of the technique. Monotonicity can be incorporated into the estimation algorithm if in practice it is a serious problem.
Foresi and Peracchi (1995) focus on estimating points on the conditional CDF using binary response models. Here, estimated binary response models are used to produce forecasts of the probability of exceeding candidate quantiles in the left tail of the conditional CDF (i.e., forecasts of points on the left tail of the conditional CDF). In the simulation and empirical work here, for simplicity, the cumulative normal and logistic CDFs are used. For example, when the logistic is used,
Where βi is a k×1 vector of parameters and Xt−1is the vector of predictors in (1). The conditional CDF Fi,t can then be estimated by replacing B in (7) with the ML estimator ˆβi.1
1 The asymptotic properties of the ML estimator for a Logit model are well-known and they are omitted.
Step (ⅱ) of the BRV method is to use the estimated parameters from (ⅰ) to compute a one-step ahead probability forecast for each candidate threshold ri. Therefore, when the logistic CDF is used,
where XT is the vector of predictors at time T. Step (ⅲ) of the BRV method involves finding the relevant VaR ˆri=argmin{ˆpi,T+1−p}. This can be done using a linear computer grid search. To implement this method, the practitioner needs to decide on a functional form for the link function and on the total number of thresholds ri to use the size of N) and on their location and spacing. In the empirical application below, for simplicity, the logistic CDF is used as a link function. Similar link functions can also be used (e.g., normal CDF, Student t CDF, Generalized Extreme Value CDF, semi-parametric link functions, etc.) and, in practice, backtesting over a historical sample period could be employed to select the best performing link function from a set of candidate functions. In simulations and empirical applications discussed below, I found a grid of ri values in backtesting, starting with the third value of the order statistics for the historical returns followed by the 1st, 3rd, 5th, 10th and 15th percentiles, which produce good results (thus N = 6). Cubic spline interpolation is then used to increase the number of forecasts and thresholds.
Clearly, if the link function has exactly the same form as the population conditional CDF for returns (e.g., normal-normal or logistic-logistic) and appropriate regressors are employed, then as T→∞, the Foresi and Peracchi (1995) method provides a consistent estimator of points on the left tail of the conditional CDF, providing that the regularity conditions required for ML to be a consistent estimator in this instance, are satisfied. To illustrate this in action, assume the following Data Generating Process (DGP) for log returns,
where L(0, 1) and N(0, 1) denote a logistic and normal distribution with mean of zero and a variance of one. Therefore, conditional on the X variables generated by AR(1) models, returns have a logistic distribution. We simulate representative series of returns from this general DGP. For one set the following parameter values are used; γ1=0.50,γ2=0,θ1=0.30For the other set; γ1=0.50,γ2=0.50,θ1=0.30,θ2=0.30.Therefore in the first set of simulations the model contains a single stationary regressor, while in the second set the model contains two stationary regressors. Observations from (9)–(12) are simulated for the following sample sizes, T = 100,200,500, 1000, 10000. For each series, we then estimate the left tail of the conditional CDF using the method of Foresi and Peracchi (1995) employing the logistic CDF as the link function over a grid of ri values starting with the third value of the order statistics for the historical returns followed by the 1st, 3rd, 5th, 10th and 15th percentile values.2 In both cases, a constant and the correct explanatory variables are used in the link function.
2Note that here the model is not predictive since the explanatory variables are current values which we make no attempt to forecast. In the Monte Carlo simulations and empirical application discussed below, which involves forecasting, lagged values of the explanatory variables are used.
2.2. Monte carlo simulation results
For each replication VaR is computed using QR, along with the original Risk Metrics and HS methods, the true volatility is used when computing the Risk Metrics VaR.
To assess the finite-sample performance of each method, for each replication the estimated unconditional coverage is computed ˆp. The 5th, 25th, 50th, 75th and 95th percentiles of the empirical distribution of ˆp are reported in Table 1 for the 95% confidence level and Table 2 for the 99% confidence levels. In Table 1 and 2, these are reported in four rows: The first row is Gaussian GARCH data generating process (DGP), the second row is threshold GARCH data generating process, the third row contains autoregressive Gaussian GARCH data generating process, and the fourth-row reports autoregressive threshold GARCH date generation process. The selection of a confidence level for an interval determines the probability that the confidence interval produced will contain the true parameter value. Common choices for the confidence level are 0.95, and 0.99. These levels correspond to percentages of the area of the normal density curve. For example, a 95% confidence interval covers 95% of the normal curve-the probability of observing a value outside of this area is less than 0.05. Because the normal curve is symmetric, half of the area is in the left tail of the curve, and the other half of the area is in the right tail of the curve.
The second-fourth simulation experiments are the same as the first but with different DGPs for the returns, allowing for serial correlation and conditionally non-Gaussian returns. The various DGPs for all the experiments are given below:
DGP 1. Gaussian-GARCH
DGP 2. t-GARCH.
DGP 3. AR-Gaussian-GARCH
DGP 4. AR – t – GARCH
For DGPs 1 and 3, a normal CDF is used as the link function when using the BRV method (i.e. Probit models are estimated). For DGPs 2 and 4, a logistic CDF is used as the link function (Logit models are estimated). For DGPs 1 and 2, just a constant is included as a predictor in the relevant link function, and for DGPs 3 and 4, the link function also contains a lag of returns (hence, the estimated models are correctly specified).
For the 95% confidence level, it can be seen in Table 1 that the empirical distribution of ˆp for both the BRV and QR methodes are virtually identical. They both have good levels of unconditional coverage given the small size of the backtesting period (250 observations). The distribution of ˆp is centered close to the population value of p = 0.05, irrespective of the DGP. In both cases, the performance of these methods is similar, irrespective of whether returns are conditionally Gaussian or non-Gaussian. It can be shown that the logistic CDF closely approximates a student t CDF with 9 degrees of freedom (see Mudholkar and George (1978). Hence, the BVR method using Logit models is clearly well-suited to computing VaR if the conditional distribution for returns is thought to be fat tailed. The results in Table 2 show that at the 99% confidence level, the BRV and QR methods also produce very similar results. Again, the distributions of ˆp are centered close to the population value of p = 0.01 and both have a similar variance.
In contrast, however, it can be seen that at both confidence levels, the Risk Metrics method significantly underestimates the population VaR for DGPs 2 and 4 (non-Gaussian returns). For DGP 3, (Gaussian returns but with serial correlation), the Risk Metrics method, which ignores serial correlation, slightly underestimates the population VaR. The HS method gives similar results to the BRV and QR methods for all the DGPs at both confidence levels, and there is no distinguishable difference in the performance of the HS method for the DGPs with and without serial correlation.
3.
Results
In this section, I discuss an empirical application involving DJI and DJUSMT series. The application involves recursively computing the daily VaR at the 95% and 99% confidence levels using the BRV, QR, Risk Metrics, and HS methods for every trading day over the three-year period 02/01/06–31/12/08 (755 days), using a five-year window of historical data for parameter estimation (approximately 1250 observations). For example, VaR on 02/01/06 is computed using data from 31/12/00–31/12/05. Note that the parameters of the BRV and QR models are re-estimated each day. The conditional standard deviation h1/2t−1 is chosen as an important predictor following the evidence in Christoffersen and Diebold (2006) on its ability to forecast the direction of stock returns, and a positive relationship with pi,t is expected. TBt−1 is included to allow for present value effects, and Vt−1 is included to capture market sentiment, and again for both, a positive relationship with pi,t is expected. Conventional ML is used for parameter estimation in the BRV method and the interior point algorithm of Koenker and Park (1978) is used for parameter estimation in the QR method. In the Risk Metrics method, an EWMA volatility forecast with a weight of 0.94 is used, which is the default choice for the Risk Metrics method applied to daily data.
Prior to discussing the backtesting results, as an example, the estimated Logit model parameters and robust t-statistics computed using Huber-White robust standard errors are given in Table 3 for each index and stock at three points over the backtesting period (the points are 29/12/06, 31/12/07, and 30/12/08). In each case, the 1st percentile of the order statistics is used to define the threshold. On the basis of the robust t – statistics, I found clear evidence that all three explanatory variables are statistically significant at one or more of these points, and that when statistically significant, the signs of the estimated parameters are consistent with our expectations. Note that in Table 3, ˆβ0 is the estimated constant, which are −13.162 for DJI and −10.708 for DJUSMT, and ˆβ1 is the estimated parameter on TBt−1, which are 0.518 for DJI and −0.018 for DJUSMT. ˆβ2 is the estimated parameter on Vt−1, which are 0.119 for DJI and 0.107 for DJUSMT. ˆβ3 is the estimated parameter on h1/2t−1, which are 2.617 for DJI and 2.551 for DJUSMT. Robust t-statistics computed using Huber-White standard errors are in parentheses.
As one might expect, the exact statistical significance of the estimated parameters varies depending on the index or stock and sample period, but there are some patterns. For example, for each series, I found that h1/2t−1 is strongly statistically significant, but that the statistical significance of TBt−1 and Vt−1 varies over the sample. Note that I do not eliminate statistically insignificant predictors prior to computing VaR using the BRV method; however, this method could be taken in future research to allow for structural change.
4.
Backtesting results
To summarize the backtesting results, for each method and at each confidence level, the estimated unconditional coverage ˆp is reported for comparison with the population unconditional coverage p. Christoffersen (1998) proposes a complete methodology for evaluating the number of exceedances and their independence. The independence test rationale dictates that, if the violations are dependent, then the transition probabilities would not be equal. Finally, Christoffersen (1998) proposes a joint test that combines both hypotheses (Conditional Coverage CC hypothesis). Correct unconditional coverage LRuc and independence of the VaR exceedances LRind are proposed by Christoffersen (1998). The LRuc and LRindtests utilize the fact that if the VaR method is perfect, then VaR exceedances should be unpredictable and so a binary indicator of exceedances (the hit indicator),
where ˆQ(p)t+1 is the forecast of the relevant return quantile, should be an independent Bernoulli random variable. The LRuc and LRindtests are straightforward to compute and have a x2 (1) asymptotic distribution (see Christoffersen, 1998, for further details).
The backtesting results for the DJI at the 95% and 99% confidence levels, respectively, are given in Table 4. The results for the method that is optimal on the basis of the estimated unconditional coverage ˆp relative to the population value p are bolded. For DJI, at the 95% confidence level, the BRV result is ˆp = 0.05, suggesting that the population VaR over this period is estimated extremely well (p = 0.05). In contrast, the Risk Metrics, HS, and QR results are ˆp = 0.061, ˆp = 0.110, and ˆp = 0.052, respectively, suggesting that the population VaR is underestimated by each of these methods. The LRac test rejects the null hypothesis of correct unconditional coverage for the Risk Metrics and HS methodes at either the 5% or 1% significance levels. For DJI at the 99% confidence level, the Risk Metrics result is ˆp = 0.029 while the HS result is ˆp = 0.045, and the QR result is ˆp = 0.034.
On the basis of the estimated unconditional coverage ˆp, in both cases, the BRV method is superior to any of the other methodes considered. For DJUSMT, at the 95% confidence level, the BRV result is ˆp = 0.05, suggesting that the population VaR over this period is estimated extremely well (p = 0.05). In contrast, the Risk Metrics, HS, and QR results are ˆp = 0.065, ˆp = 0.088, and ˆp = 0.032, respectively, suggesting that the population VaR is underestimated by each of these methods. The LRac test rejects the null hypothesis of correct unconditional coverage for the Risk Metrics and HS methodes at either the 5% or 1% significance levels. For DJUSMT at the 99% confidence level, the Risk Metrics result is ˆp = 0.024 while the HS result is ˆp = 0.053, and the QR result is ˆp = 0.19. Therefore, again, in all three cases, these results suggest that the population VaR is underestimated. Furthermore, in all three cases, LRuc rejects the null hypothesis of correct unconditional coverage at conventional significance levels. The BRV result is ˆp = 0.015, which is much closer to the desired level of coverage.
The results for the DJI and DJUSMT indices in Table 5 show that at the 95% confidence level, the QR method is preferred on the basis of the estimated unconditional coverage with the BRV method being the next most accurate. At the 99% confidence level, the BRV method is preferred. Again, for the other method considered ˆp>p and the rejections obtained from LRuc, it is suggested that the population VaR is underestimated. In this case, rejections are also obtained from LRind for the BRV, QR, and HS methods at the 95% confidence level, suggesting mis-specified models.
The optimal method on the basis of the estimated unconditional coverage ˆp are either the BRV method (optimal in four out of six cases) or the QR method (optimal in the remaining two cases).
5.
Discussion
Underestimating the population VaR can lead to serious penalties for banks in countries where the Basel Ⅱ Capital Accord has been implemented and is a well-known weakness of the Risk Metrics method if the population conditional distribution is fat-tailed.
6.
Conclusions
I propose an alternative parametric method to computing the widely used financial risk measure VaR. The BRV method involves using binary response models to compute probability forecasts of the portfolio return exceeding a grid of candidate quantile values. The candidate quantile value associated with a probability forecast closest to the desired VaR probability is chosen as the VaR. The performance of the BRV method is impressive relative to the orthodox Risk Metrics and HS methods and a QR-based method, both in Monte Carlo simulation experiments and an empirical application involving DJI and DJUSMT indexes. In the empirical application, the BRV method is the most accurate method in most cases on the basis of the estimated unconditional coverage. Note in particular that the BRV method is the best performing method for computing the daily VaR at both the 95% and 99% confidence levels over the turbulent period 02/01/06–31/12/08. The BRV and QR methods perform similarly, but relative to QR, the BRV method has the practical advantage that conventional ML methods can be used for parameter estimation and robust inference.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Conflict of interest
The authors declare no conflict of interest.