A mixture deep neural network GARCH model for volatility forecasting

Wenhui Feng; Yuan Li; Xingfa Zhang; Wenhui Feng; Yuan Li; Xingfa Zhang

doi:10.3934/era.2023194

Electronic Research Archive

2023, Volume 31, Issue 7: 3814-3831. doi: 10.3934/era.2023194

Previous Article Next Article

Research article Special Issues

A mixture deep neural network GARCH model for volatility forecasting

1.
School of Economics and Statistics, Guangzhou University, Guangzhou 510006, China
2.
Institute of Applied Mathematics, Shenzhen Polytechnic, Shenzhen 518000, China

Received: 01 March 2023 Revised: 20 April 2023 Accepted: 22 April 2023 Published: 08 May 2023

Recently, deep neural networks have been widely used to solve financial risk modeling and forecasting challenges. Following this hotspot, this paper presents a mixture model for conditional volatility probability forecasting based on the deep autoregressive network and the Gaussian mixture model under the GARCH framework. An efficient algorithm for the model is developed. Both simulation and empirical results show that our model predicts conditional volatilities with smaller errors than the classical GARCH and ANN-GARCH models.

Keywords:

Citation: Wenhui Feng, Yuan Li, Xingfa Zhang. A mixture deep neural network GARCH model for volatility forecasting[J]. Electronic Research Archive, 2023, 31(7): 3814-3831. doi: 10.3934/era.2023194

Related Papers:

[1]	Wenjie Li, Zimei Huang . Do different stock indices volatility respond differently to Central bank digital currency signals?. Electronic Research Archive, 2023, 31(9): 5573-5588. doi: 10.3934/era.2023283
[2]	Xite Yang, Ankang Zou, Jidi Cao, Yongzeng Lai, Jilin Zhang . Systemic risk prediction based on Savitzky-Golay smoothing and temporal convolutional networks. Electronic Research Archive, 2023, 31(5): 2667-2688. doi: 10.3934/era.2023135
[3]	Xianfei Hui, Baiqing Sun, Indranil SenGupta, Yan Zhou, Hui Jiang . Stochastic volatility modeling of high-frequency CSI 300 index and dynamic jump prediction driven by machine learning. Electronic Research Archive, 2023, 31(3): 1365-1386. doi: 10.3934/era.2023070
[4]	Liping Fan, Pengju Yang . Load forecasting of microgrid based on an adaptive cuckoo search optimization improved neural network. Electronic Research Archive, 2024, 32(11): 6364-6378. doi: 10.3934/era.2024296
[5]	Hao Nong, Yitan Guan, Yuanying Jiang . Identifying the volatility spillover risks between crude oil prices and China's clean energy market. Electronic Research Archive, 2022, 30(12): 4593-4618. doi: 10.3934/era.2022233
[6]	Zhiyuan Feng, Kai Qi, Bin Shi, Hao Mei, Qinghua Zheng, Hua Wei . Deep evidential learning in diffusion convolutional recurrent neural network. Electronic Research Archive, 2023, 31(4): 2252-2264. doi: 10.3934/era.2023115
[7]	Yiyuan Qian, Kai Zhang, Jingzhi Li, Xiaoshen Wang . Adaptive neural network surrogate model for solving the implied volatility of time-dependent American option via Bayesian inference. Electronic Research Archive, 2022, 30(6): 2335-2355. doi: 10.3934/era.2022119
[8]	Boshuo Geng, Jianxiao Ma, Shaohu Zhang . Ensemble deep learning-based lane-changing behavior prediction of manually driven vehicles in mixed traffic environments. Electronic Research Archive, 2023, 31(10): 6216-6235. doi: 10.3934/era.2023315
[9]	He Ma, Weipeng Wu . A deep clustering framework integrating pairwise constraints and a VMF mixture model. Electronic Research Archive, 2024, 32(6): 3952-3972. doi: 10.3934/era.2024177
[10]	Sanqiang Yang, Zhenyu Yang, Leifeng Zhang, Yapeng Guo, Ju Wang, Jingyong Huang . Research on deformation prediction of deep foundation pit excavation based on GWO-ELM model. Electronic Research Archive, 2023, 31(9): 5685-5700. doi: 10.3934/era.2023288

Abstract

1. Introduction

Volatility forecasting is essential in asset pricing, portfolio allocation and risk management research. Early volatility forecasting was based on economic models. The most famous economic models are the ARCH model ^[1] and the GARCH model ^[2], which can capture volatility clustering and heavy-tail features. However, they fail to capture asymmetry, such as leverage effects. The leverage effect is due to the fact that negative returns have a more significant impact on future volatility than positive returns. To overcome this drawback, the exponential GARCH (EGARCH) model ^[3] and GJR model ^[4] were proposed. In the following years, new volatility models based on the GARCH model emerged, such as the stochastic volatility model ^[5] proposed by Hull and White and the realized volatility model ^[6] offered by Blair et al. They formed a class of GARCH-type volatility models for financial markets.

The traditional GARCH model has strict constraints and requires the financial time series to satisfy the stationarity condition. It usually assumes conditional variances have a linear relationship with previous errors and previous variances. However, many financial time series show certain nonstationary and nonlinear characteristics in practice. Consequently, some extended model from GARCH is necessary to study the volatility of these time series.

With the development of computer and big data technologies, machine learning brings new ideas to volatility forecasting ^{[7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]}. Especially artificial neural networks(ANN) have shown an outstanding performance. It derives its computational ideas from biological neurons and is now widely used in various fields.

In financial risk analysis, researchers have utilized neural networks to study the volatility of financial markets. Hamid and Iqbal ^[22] apply ANN to predict the S&P500 index implied volatility, finding that ANN's forecasting performance surpasses that of the American option pricing model. Livieris proposes an artificial neural network prediction model for forecasting gold prices and trends ^[23]. Additionally, Dunis and Huang ^[24] explore neural network regression (NNR), recurrent neural networks, and their collaborative NNR-RNN models for predicting and trading the volatility of daily exchange rates of GBP/USD and USD/JPY, with results indicating that RNNs have the best volatility forecasting performance. Beyond the direct application of neural networks, researchers have investigated a series of mixture models ^{[25,26,27,28,29,30,31]} that combine ANNs and GARCH models. Liu et al. ^[32] introduce a volatility forecasting model based on the recurrent neural network (RNN) and the GARCH model. Experiments reveal that such mixture models enhance the predictive capabilities of traditional GARCH models, capturing normality, skewness and kurtosis of financial volatility more accurately.

This study employs a mixture model (DeepAR-GMM-GARCH) that combines the deep autoregressive network, the Gaussian mixture model and the GARCH model for probabilistic volatility forecasting. First, we discuss the design of the mixture model; Second, the article presents the model's inference and the training algorithm of the model; Third, we conduct a simulation experiment using artificial data and compare the outcomes with traditional GARCH models, finding that our model yields smaller RMSE and MAE; Last, we investigate the correlation between the square of extreme values and the square of returns for the CSI300 index. The empirical data is partitioned into training and test sets. After training and testing, we analyze the prediction results and observe that our proposed model outperforms other models in both in-sample and out-of-sample analyses.

The key offerings presented in this article can be summarized as follows: Initially, this article introduces a novel conditional volatility probability prediction model, which addresses the leptokurtic and heavy-tail traits of conventional financial volatility. This model is built upon a deep autoregressive network combined with a Gaussian mixture distribution. Subsequently, we incorporate extreme values into the mixture model via the neural network. It is discovered that the inclusion of extreme values enhances the accuracy of volatility predictions.

The structure of this paper is as follows: Section 2 outlines the GARCH model and the deep autoregressive network. Section 3 delves into the mixture model, elaborating on inference, prediction and the relevant algorithm. Section 4 encompasses the simulation studies that we propose. Lastly, Section 5 focuses on the empirical analysis of our proposed model.

2. The GARCH model and the deep autoregressive network

2.1. The GARCH model

Scholars usually believe that stock price or stock index returns are nonlinear, asymmetric, heavy-tailed, returns are generally uncorrelated. Aggregation characterises volatility, which was first found by Engle (1982) and Bollerslev (1986) in ARCH and GARCH models. The GARCH model is defined as follows:

$\begin{equation} \begin{aligned} &r_t = \varepsilon_t\sqrt{h_{t}}, \\ &h_t = \alpha_0+\sum\limits_{i = 1}^q\alpha_{i}r_{t-i}^{2}+\sum\limits_{j = 1}^p\beta_jh_{t-j} , \end{aligned} \end{equation}$

(2.1)

where $h_{t}$ is the conditional heteroscedastic variance of return series $r_{t}$ .

Although there are many criteria of GARCH( $p$ , $q$ ) models to find $p$ and $q$ , it is sufficient to apply the GARCH( $1$ , $1$ ) model to characterize the conditional volatilities.

2.2. The DeepAR model

The DeepAR model ^[33], illustrated in Figure 1, is a time series forecasting model that employs a deep autoregressive recurrent network architecture. Distinct from other time series forecasting models, DeepAR generates probabilistic predictions.

Figure 1. The network structure of the DeepAR model with

$p$ inputs and

$q$ outputs.

DownLoad: Full-Size Img PowerPoint

Consider a time series $[x_{1}, \dots, x_{t_{0}}, x_{t_{0}+1}, \dots, x_{T}] : = x_{1:T}$ . Given its past time series $[x_{1}, \dots, x_{t_{0}-2}, x_{t_{0}-1}] : = x_{1:t_{0}-1}$ , our objective is to predict the future time series $[x_{t_0}, \dots, x_{t_{0}+(T-1)}, x_{t_{0}+T}] : = x_{t_{0}:t_ {0}+T}$ . The DeepAR model constructs the conditional distribution $P_{\Theta}(x_{t_{0}: T} |x_{1:t_{0}-1})$ using a latent factor $z$ , which is implemented by a deep recurrent network architecture. This conditional distribution, $P_{\Theta}(x_{t_{0}: T} |x_{1:t_{0}-1})$ , comprises a product of likelihood factors ( $z$ )

$\begin{equation} \begin{aligned} P_{\Theta}\left({{\bf x}}_{t_0: T} \mid {{\bf x}}_{1: t_0-1}\right) & = \prod\limits_{t = t_0}^T P_{\Theta}\left(x_{t} \mid {{\bf x}}_{1: t-1}\right) \\ & = \prod\limits_{t = t_0}^T p\left(x_{t} \mid \theta\left({\bf z}_{t}, \Theta\right)\right). \end{aligned} \end{equation}$

(2.2)

The likelihood $p(x_{t}|\theta(z_{t}))$ is a fixed distribution with parameters determined by a function $\theta(z_{t}, \Theta)$ of the network output $z_{t}$ . As suggested by the model's authors, Gaussian likelihood is appropriate for real-valued data.

3. The mixture model

Forecasting the probability of volatility in finance and economics is an important problem. There are mainly two methods to tackle this. First, statistical models such as ARCH and the GARCH models are usually adopted. These models are specifically designed to capture the dynamic nature of volatility over time and help to predict future levels of volatility based on past patterns.

Another strategy involves using machine learning models, such as neural networks, which can analyze vast amounts of data and uncover patterns that may not be readily apparent to human analysts. A case in point is the DeepAR model is a series-to-series probabilistic forecasting model. The advantages of the DeepAR model are: it makes probabilistic forecasting and allows to introduce additional covariates. Due to these advantages, it can be used to predict financial volatility( $h_{t}$ ) based on the series $r_{t}^{2}$ . However, the DeepAR model usually assumes that $p(x_{t}|\theta(z_{t}))$ (given in (2.2)) follows a Gaussian distribution, which may be unreasonable due to the non-negative, leptokurtic and heavy-tail characteristics of traditional financial volatility. To avoid this problem, people use the gaussian mixture distribution to describe the density of $p(ln(x_{t})|\theta(z_{t}))$ , see references ^[34]. Motivated by the above results, this paper propose an improved mixture model: DeepAR-GMM-GARCH.

The conditional distribution of $ln(h_{t})$ can be expressed as:

$\begin{equation} \begin{aligned} P(ln(h_{t})|r_{1:t-1}^{2}, {\bf x}_{1:t-1}), \end{aligned} \end{equation}$

(3.1)

where $h_{t}$ represents the future volatility at time $t$ , $[r_{1}, ..., r_{t-2}, r_{t-1}]: = r_{1:t-1}$ denotes the past return series during the $[1:t-1]$ period, and ${\bf x}_{1:t-1}$ refers to the covariate, which is observable at all times. The past time horizon is represented by $[1:t-1]$ .

The proposed hybrid model assumes that the conditional density for logarithm of the volatility is given by $p(ln(h_{t})|r_{1:t-1}^{2}, {\bf x}_{1:t-1})$ , which includes a set of latent factors, denoted as $z_{t}$ . A recurrent neural network with hyperparameters( $\Theta_{1}$ ), specifically an LSTM, encodes the squared returns $r_{t}^{2}$ , the input features ${\bf x}_{t}$ and the previous latent factors $z_{t-1}$ , generating the updated latent factors $z_{t}$ . The likelihood $p(ln(h_{t})|\theta (z_{t}))$ follows a Gaussian mixture distribution with parameters determined by a function $\theta(z_{t}, \Theta_{2})$ of the network output $z_{t}$ . The network architecture of the DeepAR-GMM-GARCH model is depicted in Figure 2.

Figure 2. The network structure of the DeepAR-GMM-GARCH model with

$m$ inputs and one output.

DownLoad: Full-Size Img PowerPoint

3.1. The DeepAR-GMM-GARCH model

Due to the complex interplay between volatility and the factors that influence it, this paper's central model component declares that the volatility $h_{t}$ of a time series at time $t$ is derived from the latent variable $z_{t-1}$ at time $t-1$ , the square of return $r_{t-1}^{2}$ and the covariates ${\bf x}_{t-1}$ . $p(\ln(h_{t})|\theta(z_{t}))$ follows a Gaussian mixture distribution composed of K components. In the empirical analysis, ${\bf x}_{t-1}$ will be substituted with a vector of extreme values. A nonlinear mapping function $g$ is used to establish this relationship. The DeepAR-GMM-GARCH model proposed in this paper is as follows.

$\begin{equation} \begin{split} &z_{t} = g(z_{t-1}, r _{t-1}^{2}, x_{t-1}, \Theta_{1} ) , \\ &\mu_{k, t} = log(1+exp(w_{k, \mu}^{T}z_{t}+b_{k, \mu})) , \\ &\sigma_{k, t} = log(1+exp(w_{k, \sigma}^{T}z_{t}+b_{k, \sigma})), \\ &\pi_{k, t} = log(1+exp(w_{k, \pi}^{T}z_{t}+b_{k, \pi})), \\ &P(ln(h_{t})|z_{t}, \Theta_{2}) \sim \sum\limits_{i = 1}^{K} \pi_{k}N(\mu_{k, t}, \sigma_{k, t}) , \\ &r_{t} = \varepsilon_{t}\sqrt{h_{t}}, \\ &\sum\limits_{i = 1}^{K}\pi_{k} = 1. \\ \end{split} \end{equation}$

(3.2)

The model can be viewed as a structure for nonlinear volatility prediction models since the conditional distribution of the perturbation $\varepsilon_{t}$ in the model can be selected as $N(0, 1)$ and $T(0, 1, v)$ . Consequently, this gives rise to two distinct models, referred to as DeepAR-GMM-GARCH and DeepAR-GMM-GARCH-t.

Assuming that the distribution of $p(ln(h_{t})|\theta(z_{t}))$ follows a Gaussian distribution, model (3.2) will be reduced to a more simple version:

$\begin{equation} \begin{split} &z_{t} = g(z_{t-1}, r _{t-1}^{2}, x_{t-1}, \Theta_{1} ), \\ &\mu_{t} = log(1+exp(w_{\mu}^{T}z_{t}+b_{\mu})) , \\ &\sigma_{t} = log(1+exp(w_{\sigma}^{T}z_{t}+b_{\sigma})) , \\ &P(ln(h_{t})|z_{t}, \Theta_{2}) \sim N(\mu_{t}, \sigma_{t}^{2}) , \\ &r_{t} = \varepsilon_{t}\sqrt{h_{t}} .\\ \end{split} \end{equation}$

(3.3)

For similarity, we call the above as DeepAR-GARCH model.

3.2. Inference and learning

3.2.1. Algorithm

For a given time series, our goal is to estimate the parameters $\Theta_{1}$ of the LSTM cells and the parameters $\Theta_{2}$ in function $\theta$ which applies an affine transformation followed by a softplus activation. We employ a quasi-maximum likelihood estimation method with the likelihood function: $\Theta = argmax\sum_{i}logp(\widetilde{h_{i}}|\Theta_{1}, \Theta_{2})$ . Inferring from this likelihood function necessitates taking into account the latent variable $z_{t}$ .

The flowchart for the training algorithm of the model is shown below. First, we utilize the BIC criterion to identify the number of classifications, $K$ , for all samples. Each data point is assigned a label from 1 to $K$ , and each cluster $k$ has its mean vector and covariance matrix. Based on these findings, we establish the initial $\pi_{k}$ as the proportion of data points labelled as $k$ and set the initial mean vector $\mu_{k}$ and covariance matrix $\sum_{k}$ to the mean vector and covariance matrix within cluster $k$ . As a result, we obtain the parameter values $(\theta = {\widetilde \pi_{k, 0}, \widetilde \mu_{k, 0}, \widetilde \sigma_{k, 0}^{2}})$ from the initial cluster and use them to pre-train the DeepAR-GMM-GARCH model. This approach allows our model to converge quickly. Next, we partition the training sample data into multiple batches, select one sample from a batch and use the sample $(r_{t0-m}^{2}, \dots, r_{t0-1}^{2})$ as the input for the DeepAR-GMM-GARCH model. The model calculates a set of $\widetilde \pi_{k, t}, \widetilde \mu_{k, t}, \widetilde \sigma_{k, t}^{2}$ , after which we sample from this Gaussian mixture model, compute the loss, and update the parameters through gradient descent. Since direct differentiation of the sampling is infeasible, we apply the reparameterization trick to adjust the model's parameters. We continue this training process until the end of the training cycle. Last, we input the training set sample into our trained model for prediction evaluation. The model sequentially calculates the parameters for both the latent variable and the mixed Gaussian model and then proceeds with sampling. Ultimately, we provide the prediction results derived from the sampling outcomes.

The training algorithm is shown in the Algorithm $1$ .

Algorithm 1 Training Procedure for DeepAR-GMM-GARCH Mixture Model

1: for each batch do
2: for each

$t \in [t0-m, t0-1]$ do
3: if

$t$ is

$t0-m$ then
4:

$z_{t-1} = 0$
5: else {

$t$ is not

$t0-m$ }
6:

$z_{t} = g(z_{i-1}, r_{i-1}^{2}, x_{t-1}, \Theta_{1})$
7: end if
8: for each

$k \in [1, K]$ do
9:

$\widetilde \mu_{k, t} = log(1+exp(w_{k, \mu}^{T}z_{t}+b_{k, \mu}))$
10:

$\widetilde \sigma_{k, t}^{2} = log(1+exp(w_{k, \sigma}^{T}z_{t}+b_{k, \sigma}))$
11:

$\widetilde \pi_{k, t} = log(1+exp(w_{k, \pi}^{T}z_{t}+b_{k, \pi}))$
12: end for
13: sample

$ln(\widetilde{h_{t}}) \sim GMM(\widetilde \pi_{k, t}, \widetilde \mu_{k, t}, \widetilde \sigma_{k, t}^{2})$
14: end for
15: compute Loss, model parameters

$\Theta_{1}$ ,

$\Theta_{2}$ adjust using gradient descent method.
16: end for

3.2.2. Loss function

During the training process, the definition of the loss function determines the prediction quality of the model. We use the the average negative loglikelihood function as the loss function $Loss = L_{h}$ . In the GARCH model, we usually assume that $\varepsilon_{t}$ obeys a Gaussian distribution or a Student distribution, two loss functions are as follows:

(1) When $\varepsilon_{t} \sim N(0, 1)$ , the loss function is:

$\begin{equation} L_{h} = -\frac{1}{N}\sum\limits_{t = 1}^{N}[\log(\widetilde h_{t}-\frac{r_{t}^{2}}{2\widetilde h_{t}})] . \end{equation}$

(3.4)

(2) When $\varepsilon_{t} \sim t(0, 1, v)$ , the loss function is:

$\begin{equation} \begin{split} L_{h} = & -\frac{1}{N} \sum\limits_{t = 1}^{N}[\log(\widetilde h_{t})+\frac{1}{2}(v+1)\log(1+\frac{r_{t}^{2}}{\widetilde h_{t}(v-2)})]. \end{split} \end{equation}$

(3.5)

To calculate the above loss functions, we need to get samples for $\widetilde{h_{t}}$ based on algorithm1 given in Section 3.2.1. In practice, if $\varepsilon_{t}$ follows other distributions, based on the idea of QMLE, we still can use the loss function given in (3.4) see Liu and So, 2020.

4. Simulation

Experiments are carried out on volatility inference using simulated time series. These series exhibit flexibility, with both volatility and mixing coefficients changing over time, as detailed below:

$\begin{equation} \begin{aligned} &r_{t} = \varepsilon_{t}\sqrt{h_{t}}, \varepsilon_{t}\sim N(0, 1), \\ &p\left(h_{t}|F_{t-1}\right) = \eta_{1, t} \phi\left(\mu_{t}, \sigma_{1, t}^{2}\right)+\eta_{2, t} \phi\left(\mu_{t}, \sigma_{2, t}^{2}\right) , \\ &\mu_{t} = a_{0}+a_{1} h_{t-1} , \\ &\sigma_{1, t}^{2} = \alpha_{01}+\alpha_{11} r_{t-1}^{2}+\beta_{1} \sigma_{1, t-1}^{2} , \\ &\sigma_{2, t}^{2} = \alpha_{02}+\alpha_{12} r_{t-1}^{2}+\beta_{2} \sigma_{2, t-1}^{2}, \\ &\pi_{1, t} = c_{0}+c_{1} h_{t-1}, \\ &\eta_{1, t} = \exp \left(\pi_{1, t}\right) /\left(1+\exp \left(\pi_{1, t}\right)\right), \end{aligned} \end{equation}$

(4.1)

where $F_{t-1}$ denotes the information set through time $t-1$ and $\phi$ is the Gaussian density function. $\eta_{1, t}$ and $\eta_{2, t}$ are mixing coefficients of two gaussian distribution and satisfy: $\eta_{2, t} = (1-\eta_{1, t})$ . When generating the simulation data, we set: $\alpha_{01} = 0. 01$ , $\alpha_{11} = 0. 1$ , $\beta_{1} = 0. 15$ , $\alpha_{02} = 0. 04$ , $\alpha_{12} = 0. 15$ , $\beta_{2} = 0. 82$ , $c_{0} = 0. 02, c_{1} = 0. 90, a_{0} = 0. 02, a_{1} = 0. 6$ . The time series has initial values: $r_{0} = 0. 1, \sigma_{0}^{2} = 0$ , $h_{0} = 0$ . The sample sizes of T = 500, 1000 and 1500 are considered, and the replication time is 1000.

For the series simulated from (4.1), we apply three models to forecast their volatility, namely, the GARCH model with $\epsilon_{t} \sim N(0, 1)$ (GARCH-n), the GARCH model with $\epsilon_{t} \sim T(0, 1, v)$ (GARCH-t) and the DeepAR-GMM-GARCH model. Using MCMC sampling method, the degree of freedom for the Student-t distribution is determined to be 6. For the DeepAR-GMM-GARCH model, we set a recurrent neural network with three LSTM layers and 24 hidden nodes. For the input nodes m in Figure 2, we use the Grid Search Algorithm to find their optimal values. We use BIC rule to choose K based on the Mclust package ^[35]. Our model's hyperparameters are trained by the software Optuna, a commonly applied automatic hyperparameters optimization software.

Table 1 informs the three volatility forecasting models' in-sample errors (RMSE and MAE). The GARCH-n and GARCH-t show similar performance, and our DeepAR-GMM-GARCH model is outstanding; All the models get decreased RMSE as sample size increase. These results imply the proposed estimation can be asymptotically convergent. Table 2 reports the average out-of-sample errors of the three volatility forecasting models. Similar to the in-sample results, our DeepAR-GMM-GARCH model is superior to the GARCH-n and GARCH-t models.

Table 1. The average in-sample errors of the GARCH-n, the GARCH-t and the DeepAR-GMM-GARCH models.

sample size	Modle	RMSE	MAE
$T=500$	$\text{GARCH-n}$	$0. 1364$	$0. 0628$
	$\text{GARCH-t}$	$0. 1211$	$0. 0591$
	$\text{DeepAR-GMM-GARCH}$	$\underline{0. 1068}$	$\underline{0. 0548}$
$T=1000$	$\text{GARCH-n}$	$0. 0621$	$0. 0437$
	$\text{GARCH-t}$	$0. 0578$	$0. 0419$
	$\text{DeepAR-GMM-GARCH}$	$\underline{0. 0398}$	$\underline{0. 0331}$
$T=1500$	$\text{GARCH-n}$	$0. 0604$	$0. 0428$
	$\text{GARCH-t}$	$0. 0652$	$0. 0401$
	$\text{DeepAR-GMM-GARCH}$	$\underline{0. 0300}$	$\underline{0. 0325}$
Note: Number of replications = 1000.

| Show Table

DownLoad: CSV

Table 2. The average out-of-sample errors of the GARCH-n, the GARCH-t and the DeepAR-GMM-GARCH models.

sample size	Modle	RMSE	MAE
$T=500$	$\text{GARCH-n}$	$0. 3564$	$0. 3028$
	$\text{GARCH-t}$	$0. 3271$	$0. 3091$
	$\text{DeepAR-GMM-GARCH}$	$\underline{0. 2761}$	$\underline{0. 2311}$
$T=1000$	$\text{GARCH-n}$	$0. 2619$	$0. 2117$
	$\text{GARCH-t}$	$0. 2318$	$0. 2033$
	$\text{DeepAR-GMM-GARCH}$	$\underline{0. 2091}$	$\underline{0. 1834}$
$T=1500$	$\text{GARCH-n}$	$0. 2241$	$0. 2179$
	$\text{GARCH-t}$	$0. 2213$	$0. 1971$
	$\text{DeepAR-GMM-GARCH}$	$\underline{0. 1911}$	$\underline{0. 1722}$
Note: Number of replications = 1000.

| Show Table

DownLoad: CSV

5. Empirical anaysis

Comprehensive stock index represents the average of the economic performance of the whole financial market. In this section, we study the China Shanghai Shenzhen index(CSI 300 index) daily OHLC data. The OHLC data contains daily high, low, open and close prices. Scholars have pointed out that combining the open, high, low and close prices can obtain more effective volatility estimates. Hence, We also intruduce OHLC data to our mixture model.

5.1. CSI 300 data set

The data of the CSI 300 index studied in this paper are from January 4, 2010, to December 30, 2021, with a total of 2916 trading days. Let $r_{t}$ be the returns of the corresponding series, which are calculated using the closing price $C_{t}$ series of the CSI300 index.

$\begin{equation} r_t = 100 \log \frac{C_{t+1}}{C_t}. \end{equation}$

(5.1)

The time series of $r_{t}$ and $r_{t}^{2}$ are ploted in . We can find that $r_{t}^{2}$ displays a significant volatility clustering characteristic, and the amplitude of volatility is gradually decreasing.

Figure 3. The time series of

$r_{t}$ and

$r_{t}^{2}$ for the CSI 300 index, as shown in (5.1), spans from January 4, 2010, to December 30, 2021.

DownLoad: Full-Size Img PowerPoint

The collected data is divided into training data and test data. describes the statistical characteristics of training and test data, respectively. The mean of $r_{t}$ is relatively small, only 0.007465. The standard deviation is 2.251006, which displays that the degree of variation is large. The skewness is less than 0, and the kurtosis is greater than 3, indicating that the sequence is of left deviation and heavy tail. The test data has the same characteristics as the training data, such as data dispersion, large volatility, left deviation and higher protrusion than the normal distribution. This may imply that the normal distribution may not be suitable for our data, and other heavy-tailed distributions, such as t distribution or mixed normal distribution, could be more suitable.

Table 3. Descriptive statistics for the training and test sets of CSI 300 returns.

Data Set	Period	Mean	Std.	Skew.	Kurt.
$\text{training data}$	$04 / 01 / 2010$ to $29 / 12 / 2017$	$0.007465$	$2.251006$	$-0.755580$	$5.136707$
$\text{test data}$	$02 / 01 / 2018$ to $30 / 12 / 2021$	$0.019125$	$1.717302$	$-0.427819$	$3.248347$

| Show Table

DownLoad: CSV

Besides the close price( $C_{t}$ ), we also introduce the high price( $H_{t}$ ), the open price( $O_{t}$ ) and the low price( $L_{t}$ ). Define $u_{t} = (H_{t}-O_{t})^{2}$ , $d_{t} = (L_{t}-O_{t})^{2}$ , $c_{t} = (C_{t}-O_{t})^{2}$ .

The correlation matrix below (5.2) shows the correlation coefficients between $u_{t}$ , $d_{t}$ , $c_{t}$ and $r_{t}^{2}$ . It can be found that there is large correlation coefficient between the pair ( $u_{t}$ , $r_{t}^{2}$ ), ( $d_{t}$ , $r_{t}^{2}$ ) and ( $c_{t}$ , $r_{t}^{2}$ ). From a common sense, large values for $u_{t}$ , $d_{t}$ and $c_{t}$ , usually means large volatility( $r_{t}^{2}$ ). However, classical volatility models do not take such extreme values into account. Consequently, it is reasonable to use neural network together with $u_{t}$ , $d_{t}$ and $c_{t}$ to forecast volatility, because neural network can introduce additional covariates and capture the complex relation between different covariates.

$\begin{equation} \begin{bmatrix} &r_{t}^{2} & u_{t} &d_{t} & c_{t} \\ r_{t}^{2} & 1.000 & 0.394 & 0.475 & 0.716\\ u_{t} & 0.394 & 1.000 & 0.012 & 0.556\\ d_{t} & 0.475 & 0.012 & 1.000 & 0.690\\ c_{t} & 0.716 & 0.556 & 0.690 & 1.000 \end{bmatrix} \end{equation}$

(5.2)

5.2. Error measures

This paper uses four evaluation indicators to measure the predictive performance of the model, they are: NMAE, HR, linear correlation coefficient and rank correlation coefficient, which is defined as follows:

$\begin{equation} \begin{aligned} & \mathrm{NMAE} = \frac{\sum\nolimits_{t = 1}^N\left|r_{t+1}^2-\widetilde{h}_{t+1}\right|}{\sum\nolimits_{t = 1}^N\left|r_{t+1}^2-r_t^2\right|, }\\ \end{aligned} \end{equation}$

(5.3)

$\begin{equation} \begin{aligned} & \mathrm{HR} = \frac{1}{N} \sum\limits_{t = 1}^N \theta_t, \\ &\theta_{t} = \left\{\begin{array}{lll} 1 & : & \left(\widetilde{h}_{t+1}-r_{t}^{2}\right)\left(r_{t+1}^{2}-r_{t}^{2}\right) \geq 0 \\ 0 & : & \text { else } \end{array}\right., \end{aligned} \end{equation}$

(5.4)

where N represents the number of predicted samples. Both NMAE and HR values range between 0 and 1. The smaller the values of these two indicators, the better the model's performance.

Scholars usually use high-frequency data volatility estimates as a proxy for actual volatility to evaluate forecasting models. We also use realized volatility( $\sigma_{RV, t}^{2}$ ) as a proxy for actual volatility, calculated by summing up the squares of intra-day returns every 5 minutes.

$\begin{equation} \sigma_{RV, t}^{2} = \sum\limits_{i = 1}^{48}[\log r_{t, i}-\log r_{t, i-1}]^{2}. \end{equation}$

(5.5)

We focus on the out-of-sample predictive performance of the models, the correlation between realized volatilities $\sigma_{RV, t+1}^{2}$ and predicted volatilities $\widetilde{h}_{t+1}$ is measured only on the test set. We calculated Pearson's coefficient

$\begin{equation} r = \frac{\sum\nolimits_{i = 1}^{N}\left(\sigma_{RV, t+1}^{2}-\sigma_{RV}^{2}\right)\left(\widetilde{h}_{t+1}-\widetilde{h}\right)}{\sqrt{\sum\nolimits_{i = 1}^{N}\left(\sigma_{RV, t+1}^{2}-\sigma_{RV}^{2}\right)^{2}} \sqrt{\sum\nolimits_{i = 1}^{N}\left(\widetilde{h}_{t+1}-\widetilde{h}\right)^{2}}}, \end{equation}$

(5.6)

where $\sigma_{RV}^{2}$ and $\widetilde{h}$ denote the respective mean values, and Spearman's rank order correlation coefficient $r_{s}$ . $r_{s}$ is also calculated using Eq (5.6). However, the actual volatilities are replaced by their ranks. Spearman's rank order correlation coefficient is considered more robust than Pearson's coefficient. $r$ and $r_{s}$ are both between $-1$ and $1$ . A value of $r$ ( $r_{s}$ ) around $0$ means that the realized volatilities and predicted volatilities are uncorrelated.

5.3. In-sample results and out-sample results

5.3.1. In-sample results

Simulation experiments demonstrate that our proposed model exhibits greater prediction accuracy than the GARCH model. In this section, to highlight the advantages of our model, we compare it to the classic GARCH model, ANN-GARCH (an existing neural network GARCH model) and the DeepAR-GARCH model using empirical data.

Among the four models, the GARCH and ANN-GARCH models predict conditional volatility, whereas the DeepAR-GARCH and DeepAR-GMM-GARCH models provide probabilistic forecasts for conditional volatility. To facilitate comparison, we will calculate the mean and quantiles of the probability density function for conditional volatility derived from the DeepAR-GARCH and DeepAR-GMM-GARCH models.

The estimated parameters of the GARCH models(GARCH-n and GARCH-t) are summarized in and . For the GARCH-t model, the degrees of freedom parameter $v$ is estimated at around $6$ . The GARCH-n and GARCH-t models are all nearly estimated with higher values of $\beta_{1}$ and lower values of $\alpha_{1}$ . The sum of $\alpha_{1}$ and $\beta_{1}$ is almost 1, Which implies the sequence may be non-stationary. Therefore, our model without the stationary constraint is more suitable. The ANN-GARCH model employs a three-layer ANN structure, featuring two input nodes, 24 nodes for the hidden layer and a single output node. Likewise, the DeepAR-GARCH and DeepAR-GMM-GARCH models also have a three-layer design, consisting of 14 input nodes, 24 nodes for the hidden layer and an output layer with two output nodes and five output nodes.

Table 4. Parameter estimation of GARCH-n model.

Data Set	$a_{0}$	$\alpha_{1}$	$\beta_{1}$
$CSI300$	$1.8365e-04$	$0.1000$	$0.8800$

| Show Table

DownLoad: CSV

Table 5. Parameter estimation of GARCH-t model.

Data Set	$a_{0}$	$\alpha_{1}$	$\beta_{1}$	$\nu$
$CSI300$	$1.8103e-04$	$0.1000$	$0.8400$	$6.4625$

| Show Table

DownLoad: CSV

Table 6 lists the performance of five volatility prediction models. In the in-sample study, the DeepAR-GMM-GARCH model has the smallest HR and loss, and the DeepAR-GARCH model has the smallest NMAE. The volatility prediction performance of the DeepAR-GMM-GARCH model is better than the traditional GARCH and DeepAR models.

Table 6. In-sample forecasting results for the GARCH-n, GARCH-t, DeepAR-GARCH and DeepAR-GMM-GARCH models.

Data Set	Model	Loss	NMAE	HR
In-sample	$\text{GARCH-n}$	$1. 401$	$0. 763$	$0. 704$
	$\text{GARCH-t}$	$1. 331$	$0. 761$	$0. 637$
	$\text{ANN-GARCH}$	$1. 603$	$0. 827$	$0. 690$
	$\text{DeepAR-GARCH}$	$1. 541$	${\bf 0. 748}$	$0. 717$
	$\text{DeepAR-GMM-GARCH}$	${\bf 1. 311}$	$0. 751$	${\bf 0.630}$

| Show Table

DownLoad: CSV

In , we display a portion of the forecasting results from various models and compare them with $r_{t}^{2}$ . As shown in (a), the forecasting results from the GARCH models differ from $r_{t}^{2}$ . The GARCH models fail to capture significant changes in $r_{t}^{2}$ . From (b), (c), it is evident that the neural network models capture the trend of $r_{t}^{2}$ and more accurately predict large fluctuations, with the DeepAR-GMM-GARCH model demonstrating the best performance. In (d), we observe that the estimated 90% quantiles from the DeepAR-GMM-GARCH model appear to be closer to the observations ( $r_{t}^{2}$ ).

Figure 4. Subplots

$(a), (b)$ ,

$(c)$ and

$(d)$ are the plots of the in-sample volatility forecasting results of the classic GARCH, ANN-GARCH, DeepAR-GARCH and DeepAR-GMM-GARCH models. (a) is the comparison of volatility forecasting results from the GARCH models. (b) is the comparison of volatility forecasting results between the ANN-GARCH model and

$r_{t}^{2}$ . (c) is the comparison of probabilistic forecasting models between the DeepAR-GARCH model and DeepAR-GMM-GARCH model. (d) is the comparison of 90% quantile forecasting results between the DeepAR-GARCH model and DeepAR-GMM-GARCH model.

DownLoad: Full-Size Img PowerPoint

To sum up, from the estimation results of and the plots in , it is shown that introducing the extreme values( $u_{t}$ , $d_{t}$ , $c_{t}$ ) can help to improve the forecasting accuracy of the mixture volatility models. Hence the proposed approach is of particular practical value.

5.3.2. Out-sample results

In the out-of-sample analysis, as discussed in Section 5.1, the test set comprises the time series of 972 trading days subsequent to the respective training set.

Table 7 presents the performance of the models using common error measures (loss function, NMAE and HR). The DeepAR-GARCH model attains a lower loss function value compared to the neural network models on the test set. The neural network models display lower NMAE and HR values than the GARCH model on the training set. The DeepAR-GMM-GARCH models exhibit the lowest NMAE and HR values on the test data set.

Table 7. Out-of-sample forecasting results for the GARCH-n, GARCH-t, ANN-GARCH, DeepAR-GARCH and DeepAR-GMM-GARCH models.

Data Set	Model	Loss	NMAE	HR
Out-sample	$\text{GARCH-n}$	$2. 320$	$0. 917$	$0. 868$
	$\text{GARCH-t}$	$2. 008$	$0. 915$	$0. 859$
	$\text{ANN-GARCH}$	$2. 517$	$0. 903$	$0. 783$
	$\text{DeepAR-GARCH}$	${\bf 1. 916}$	$0. 929$	$0. 801$
	$\text{DeepAR-GMM-GARCH}$	$2. 100$	${\bf 0.790}$	${\bf 0.722}$

| Show Table

DownLoad: CSV

In , we plot part of the forecasting results from the five models mentioned with out-sample data set and compare it with $r_{t}^{2}$ . It can be seen from (a) that the GARCH models do not capture significant changes of $r_{t}^{2}$ , the same as the in-sample results. For (b), (c), The neural network models capture most of the fluctuations of $r_{t}^{2}$ well, the DeepAR-GMM-GARCH model performing the best.

Figure 5. Subplots

$(a), (b)$ ,

$(c)$ and

$(d)$ are the plots of the out-sample volatility forecasting results of the classic GARCH, ANN-GARCH, DeepAR-GARCH and DeepAR-GMM-GARCH models. (a) is the comparison of volatility forecasting results from the GARCH models. (b) is the comparison of volatility forecasting results between the ANN-GARCH model and realized volatility(

$r_{t}^{2}$ ). (c) is the comparison of probabilistic forecasting models between the DeepAR-GARCH model and DeepAR-GMM-GARCH model. (d) is the comparison of 90% quantile forecasting results between the DeepAR-GARCH model and DeepAR-GMM-GARCH model.

DownLoad: Full-Size Img PowerPoint

From (d), we could find that The estimated 90% quantiles from the DeepAR-GMM-GARCH model seem to be more closely aligned with the observations ( $r_{t}^{2}$ ).

Section 5.2 mentions that the linear correlation $r$ and the rank correlation $r_{s}$ are two measures for comparing realized volatilities. The linear correlation r and the rank correlation $r_{s}$ between predicted and realized volatilities of the test set are reported in Table 8. On average, the DeepAR-GMM-GARCH model shows the best performance of all models. It obtains the highest rank correlation on the test set. Rank correlation is more robust than linear correlation since it detects correlations nonparametrically.

Table 8. A comparison of linear correlation (r) and rank correlation (

$r_s$ ) between the realized volatility and volatility predicted by GARCH-n, GARCH-t, DeepAR-GARCH, ANN-GARCH and DeepAR-GMM-GARCH models for test data set on the CSI300 index. The best model (the highest correlation) is underlined.

Out-sample	GARCH-n	GARCH-t	ANN-GARCH	DeepAR-GARCH	DeepAR-GMM-GARCH
$r$	$0. 381$	$0.420$	$0.490$	$0. 473$	$\underline{0. 504}$
$r_s$	$0. 477$	$0.502$	$0.500$	$0. 516$	$\underline{0. 527}$

| Show Table

DownLoad: CSV

6. Conclusions

This paper studies a mixture volatility forecasting model based on the autoregressive neural work and the GARCH model to obtain more precise forecasting for the conditional volatility model. The inference, loss functions and training algorithm of the mixture model are given. The simulation results show that our model performs better with less error than the classic GARCH models. The empirical study based on the CSI300 index shows that our model can significantly improve the forecasting accuracy with extreme values compared to the usual models.

Our research findings can offer valuable insights into the prediction of volatility uncertainty. In future studies, our model can be employed for various high-frequency volatility analysis, where it is anticipated to exhibit enhanced performance.

Acknowledgments

This work is partially supported by Guangdong Basic and Applied Basic Research Foundation (2022A1515010046) and Funding by Science and Technology Projects in Guangzhou (SL2022A03J00654).

Conflict of interest

The authors declare no conflict of interest.

References

[1]	R. F. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation, Econometrica, 50 (1982), 987–1007. https://doi.org/10.2307/1912773 doi: 10.2307/1912773
[2]	T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J. Econom., 31 (1986), 307–327. https://doi.org/10.1016/0304-4076(86)90063-1 doi: 10.1016/0304-4076(86)90063-1
[3]	D. B. Nelson, Conditional heteroskedasticity in asset returns: A new approach, Econometrica, 59 (1991), 347–370. https://doi.org/10.2307/2938260 doi: 10.2307/2938260
[4]	L. R. Glosten, R. Jagannathan, D. E. Runkle, On the relation between the expected value and the volatility of the nominal excess return on stocks, J. Financ., 48 (1993), 1779–1801. https://doi.org/10.1111/j.1540-6261.1993.tb05128.x doi: 10.1111/j.1540-6261.1993.tb05128.x
[5]	J. Hull, A. White, The pricing of options on assets with stochastic volatilities, J. Financ., 42 (1987), 281–300. https://doi.org/10.1111/j.1540-6261.1987.tb02568.x doi: 10.1111/j.1540-6261.1987.tb02568.x
[6]	B. J. Blair, S. H. Poon, S. J. Taylor, Forecasting S & P 100 volatility: the incremental information content of implied volatilities and high-frequency index returns, in Handbook of Quantitative Finance and Risk Management, Springer, (2010), 1333–1344. https://doi.org/10.1007/978-0-387-77117-5
[7]	F. Audrino, D. Colangelo, Semi-parametric forecasts of the implied volatility surface using regression trees, Stat. Comput., 20 (2010), 421–434. https://doi.org/10.1007/s11222-009-9134-y doi: 10.1007/s11222-009-9134-y
[8]	C. Luong, N. Dokuchaev, Forecasting of realised volatility with the random forests algorithm, J. Risk Financial Manag., 11 (2018), 61. https://doi.org/10.3390/jrfm11040061 doi: 10.3390/jrfm11040061
[9]	S. Mittnik, N. Robinzonov, M. Spindler, Stock market volatility: identifying major drivers and the nature of their impact, J. Bank Financ., 58 (2015), 1–14. https://doi.org/10.1016/j.jbankfin.2015.04.003 doi: 10.1016/j.jbankfin.2015.04.003
[10]	Z. Li, B. Mo, H. Nie, Time and frequency dynamic connectedness between cryptocurrencies and financial assets in China, Int. Rev. Econ. Financ., 86 (2023), 46–57. http://dx.doi.org/10.1016/j.iref.2023.01.015 doi: 10.1016/j.iref.2023.01.015
[11]	Z. Li, H. Dong, C. Floros, A. Charemis, P. Failler, Re-examining bitcoin volatility: a CAViaR-based approach, Int. Rev. Econ. Financ., 58 (2022), 1320–1338. http://dx.doi.org/10.1080/1540496X.2021.1873127 doi: 10.1080/1540496X.2021.1873127
[12]	Z. Li, C. Yang, Z. Huang, How does the fintech sector react to signals from central bank digital currencies, Financ. Res. Lett., 50 (2022), 103308. http://dx.doi.org/10.1016/j.frl.2022.103308 doi: 10.1016/j.frl.2022.103308
[13]	Z. Li, L. Chen, H. Dong, What are bitcoin market reactions to its-related events, Int. Rev. Econ. Financ., 73 (2021), 1–10. http://dx.doi.org/10.1016/j.iref.2020.12.020 doi: 10.1016/j.iref.2020.12.020
[14]	T. Li, J. Wen, D. Zeng, K. Liu, Has enterprise digital transformation improved the efficiency of enterprise technological innovation? A case study on Chinese listed companies, Math. Biosci. Eng., 19 (2022), 12632–12654. http://dx.doi.org/10.3934/mbe.2022590 doi: 10.3934/mbe.2022590
[15]	Y. Liu, P. Failler, Z. Liu, Impact of environmental regulations on energy efficiency: a case study of China's air pollution prevention and control action plan, Sustainability, 14 (2022), 3168. http://dx.doi.org/10.3390/su14063168 doi: 10.3390/su14063168
[16]	Y. Liu, Z. Li, M. Xu, The influential factors of financial cycle spillover: evidence from China, Emerg. Mark. Financ. Tr., 56 (2020), 1336–1350. http://dx.doi.org/10.1080/1540496X.2019.1658076 doi: 10.1080/1540496X.2019.1658076
[17]	D. G. Kirikos, An evaluation of quantitative easing effectiveness based on out-of-sample forecasts, Natl. Account. Rev., 4 (2022), 378–389. https://dx.doi.org/10.3934/NAR.2022021 doi: 10.3934/NAR.2022021
[18]	J. Saleemi, COVID-19 and liquidity risk, exploring the relationship dynamics between liquidity cost and stock market returns, Natl. Account. Rev., 3 (2021), 218–236. https://dx.doi.org/10.3934/NAR.2021011 doi: 10.3934/NAR.2021011
[19]	S. A. Gyamerah, B. E. Owusu, E. K. Akwaa-Sekyi, Modelling the mean and volatility spillover between green bond market and renewable energy stock market, Green Finance, 4 (2022), 310–328. https://dx.doi.org/10.3934/GF.2022015 doi: 10.3934/GF.2022015
[20]	H. Siddiqi, Financial market disruption and investor awareness: the case of implied volatility skew, Quant. Finance Econ., 6 (2022), 505–517. https://dx.doi.org/10.3934/QFE.2022021 doi: 10.3934/QFE.2022021
[21]	L. Li, X. Zhang, Y. Li, C. Deng, Daily GARCH model estimation using high frequency data, J. Guangxi Norm. Univ., Nat. Sci., 39 (2021), 1181–1191.
[22]	S. A. Hamid, Z. Iqbal, Using neural networks for forecasting volatility of S & P 500 index futures prices, J. Bus. Res., 57 (2004), 1116–1125. https://doi.org/10.1016/S0148-2963(03)00043-2 doi: 10.1016/S0148-2963(03)00043-2
[23]	I. E. Livieris, E. Pintelas, P. Pintelas, A CNN-LSTM model for gold price time-series forecasting, Neural Comput. Appl., 32 (2020), 17351–17360. https://doi.org/10.1007/s00521-020-04867-x doi: 10.1007/s00521-020-04867-x
[24]	C. L. Dunis, X. Huang, Forecasting and trading currency volatility: an application of recurrent neural regression and model combination, J. Forecast., 21 (2002), 317–354. https://doi.org/10.1002/for.833 doi: 10.1002/for.833
[25]	R. G. Donaldson, M. Kamstra, An artificial neural network-GARCH model for international stock return volatility, J. Empir. Financ., 4 (1997), 17–46. https://doi.org/10.1016/S0927-5398(96)00011-4 doi: 10.1016/S0927-5398(96)00011-4
[26]	T. H. Roh, Forecasting the volatility of stock price index, Expert Syst. Appl., 33 (2007), 916–922. https://doi.org/10.1016/j.eswa.2006.08.001 doi: 10.1016/j.eswa.2006.08.001
[27]	M. Bildirici, Ö. Ö. Ersin, Improving forecasts of GARCH family models with the artificial neural networks:An application to the daily returns in Istanbul Stock Exchange, Expert Syst. Appl., 36 (2009), 7355–7362. https://doi.org/10.1016/j.eswa.2008.09.051 doi: 10.1016/j.eswa.2008.09.051
[28]	E. Hajizadeh, A. Seifi, M. H. F. Zarandi, I. B. Turksen, A hybrid modeling approach for forecasting the volatility of S & P 500 index return, Expert Syst. Appl., 39 (2012), 431–436. https://doi.org/10.1016/j.eswa.2011.07.033 doi: 10.1016/j.eswa.2011.07.033
[29]	W. Kristjanpoller, M. C. Minutolo, Gold price volatility: A forecasting approach using the Artificial Neural Network-GARCH model, Expert Syst. Appl., 42 (2015), 7245–7251. https://doi.org/10.1016/j.eswa.2015.04.058 doi: 10.1016/j.eswa.2015.04.058
[30]	N. Nikolaev, P. Tino, E. Smirnov, Time-dependent series variance learning with recurrent mixture density networks, Neurocomputing, 122 (2013), 501–512. https://doi.org/10.1016/j.neucom.2013.05.014 doi: 10.1016/j.neucom.2013.05.014
[31]	H. Y. Kim, C. H. Won, Forecasting the volatility of stock price index: A hybrid model integrating LSTM with multiple GARCH-type models, Expert Syst. Appl., 103 (2018), 25–37. https://doi.org/10.1016/j.eswa.2018.03.002 doi: 10.1016/j.eswa.2018.03.002
[32]	W. K. Liu, M. K. P. So, A GARCH model with artificial neural networks, Information, 11 (2020), 489. https://doi.org/10.3390/info11100489 doi: 10.3390/info11100489
[33]	D. Salinas, V. Flunkert, J. Gasthaus, T. Januschowski, DeepAR: Probabilistic forecasting with autoregressive recurrent networks, Int. J. Forecast., 36 (2020), 1181–1191. https://doi.org/10.1016/j.ijforecast.2019.07.001 doi: 10.1016/j.ijforecast.2019.07.001
[34]	P. Glasserman, D. Pirjol, W-shaped implied volatility curves and the Gaussian mixture model, Quant. Financ., 36 (2021), 1–21. https://doi.org/10.1080/14697688.2023.2165448 doi: 10.1080/14697688.2023.2165448
[35]	L. Scrucca, M. Fop, T. B. Murphy, A. E. Raftery, mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, R J., 8 (2016), 289–317. https://doi.org/10.32614/RJ-2016-021 doi: 10.32614/RJ-2016-021

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1 1.3

Metrics

Article views(2251) PDF downloads(140) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(5) / Tables(8)

Electronic Research Archive

A mixture deep neural network GARCH model for volatility forecasting

Related Papers:

Abstract

1. Introduction

2. The GARCH model and the deep autoregressive network

2.1. The GARCH model

2.2. The DeepAR model

3. The mixture model

3.1. The DeepAR-GMM-GARCH model

3.2. Inference and learning

3.2.1. Algorithm

3.2.2. Loss function

4. Simulation

5. Empirical anaysis

5.1. CSI 300 data set

5.2. Error measures

5.3. In-sample results and out-sample results

5.3.1. In-sample results

5.3.2. Out-sample results

6. Conclusions

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Electronic Research Archive

A mixture deep neural network GARCH model for volatility forecasting

Related Papers:

Abstract

1. Introduction

2. The GARCH model and the deep autoregressive network

2.1. The GARCH model

2.2. The DeepAR model

3. The mixture model

3.1. The DeepAR-GMM-GARCH model

3.2. Inference and learning

3.2.1. Algorithm

3.2.2. Loss function

4. Simulation

5. Empirical anaysis

5.1. CSI 300 data set

5.2. Error measures

5.3. In-sample results and out-sample results

5.3.1. In-sample results

5.3.2. Out-sample results

6. Conclusions

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog