Citation: Hongli Niu, Kunliang Xu. A hybrid model combining variational mode decomposition and an attention-GRU network for stock price index forecasting[J]. Mathematical Biosciences and Engineering, 2020, 17(6): 7151-7166. doi: 10.3934/mbe.2020367
[1] | Xiaotong Ji, Dan Liu, Ping Xiong . Multi-model fusion short-term power load forecasting based on improved WOA optimization. Mathematical Biosciences and Engineering, 2022, 19(12): 13399-13420. doi: 10.3934/mbe.2022627 |
[2] | Hongli Niu, Yazhi Zhao . Crude oil prices and volatility prediction by a hybrid model based on kernel extreme learning machine. Mathematical Biosciences and Engineering, 2021, 18(6): 8096-8122. doi: 10.3934/mbe.2021402 |
[3] | Xin Jing, Jungang Luo, Shangyao Zhang, Na Wei . Runoff forecasting model based on variational mode decomposition and artificial neural networks. Mathematical Biosciences and Engineering, 2022, 19(2): 1633-1648. doi: 10.3934/mbe.2022076 |
[4] | Xiwen Qin, Chunxiao Leng, Xiaogang Dong . A hybrid ensemble forecasting model of passenger flow based on improved variational mode decomposition and boosting. Mathematical Biosciences and Engineering, 2024, 21(1): 300-324. doi: 10.3934/mbe.2024014 |
[5] | Eunjae Choi, Yoosang Park, Jongsun Choi, Jaeyoung Choi, Libor Mesicek . Forecasting of garlic price based on DA-RNN using attention weight of temporal fusion transformers. Mathematical Biosciences and Engineering, 2023, 20(5): 9041-9061. doi: 10.3934/mbe.2023397 |
[6] | Mingju Chen, Fuhong Qiu, Xingzhong Xiong, Zhengwei Chang, Yang Wei, Jie Wu . BILSTM-SimAM: An improved algorithm for short-term electric load forecasting based on multi-feature. Mathematical Biosciences and Engineering, 2024, 21(2): 2323-2343. doi: 10.3934/mbe.2024102 |
[7] | Jun Wu, Xinli Zheng, Jiangpeng Wang, Junwei Wu, Ji Wang . AB-GRU: An attention-based bidirectional GRU model for multimodal sentiment fusion and analysis. Mathematical Biosciences and Engineering, 2023, 20(10): 18523-18544. doi: 10.3934/mbe.2023822 |
[8] | Davide De Gaetano . Forecasting volatility using combination across estimation windows: An application to S&P500 stock market index. Mathematical Biosciences and Engineering, 2019, 16(6): 7195-7216. doi: 10.3934/mbe.2019361 |
[9] | Sakorn Mekruksavanich, Anuchit Jitpattanakul . RNN-based deep learning for physical activity recognition using smartwatch sensors: A case study of simple and complex activity recognition. Mathematical Biosciences and Engineering, 2022, 19(6): 5671-5698. doi: 10.3934/mbe.2022265 |
[10] | Huawei Jiang, Tao Guo, Zhen Yang, Like Zhao . Deep reinforcement learning algorithm for solving material emergency dispatching problem. Mathematical Biosciences and Engineering, 2022, 19(11): 10864-10881. doi: 10.3934/mbe.2022508 |
As stock markets gradually enter the public vision, the precise prediction of stock price indices has become one of the most promising research projects in forecasting of time series. The commonly used forecasting methods are simply divided into two classes: econometric methods and artificial intelligence (AI) based models. The latter, represented by artificial neural networks (ANNs), have been proved to outperform the econometric methods in dealing with non-stationary and non-linear time series [1,2,3,4]. As an improvement of traditional ANN, recurrent neural networks (RNN) [5] establish connections between the hidden layer units, through which the dependency of data at different time points can be further explained. The before-after associated structure ensures that RNN is especially suitable for predicting time series data [6]. By introducing three gate mechanism into the hidden units of traditional RNN, long short-term memory (LSTM) network overcomes the short comings existing in RNN, such as gradient disappearing and exploding in long time span [7]. Recently, LSTM has been widely utilized to predict time series and obtained outstanding results [8,9]. Gated recurrent units (GRU) network integrates the three gates of LSTM into reset gate and update gate, which effectively improves the computing efficiency of LSTM [10], and GRU achieved better results than LSTM in different time series forecasting tasks [11,12]. In this paper, the attention mechanism is introduced to assign weights to different input elements of GRU and obtain a more precise forecasting result.
To further improve the forecasting accuracy of stock price indices, hybrid models containing two or more individual models have been developed gradually, in which the unique advantages of different individual models can be exploited. Following "Divide-and-Conquer" principle, "Decomposition-and-Ensemble" is a typical framework employed in time series forecasting [13], the main idea of which is to decompose an raw complex sequence into several subseries with simple patterns so as to establish a prediction model for every subseries, and the final result is concluded by summing up the prediction results of the subseries [14]. Based on the excellent performance, hybrid forecasting models are becoming the mainstream gradually [15]. As a novel multiresolution technique originated from signal processing, variational mode decomposition (VMD) [16] is a completely non-recursive algorithm that can decompose the original series into multiple components with a specific bandwidth in the spectral domain. It has been proved that VMD performs better than the models of the same kind, such as Empirical mode decomposition (EMD) [14], in noise robustness and component decomposing accurately. In recent years, the hybrid models based on VMD have been applied successfully in several fields. For instance, by integrating the VMD with classical ANNs, Lahmiri [17] established a forecasting model VMD-PSO-BPNN for intraday stock prices prediction. The experimental results in terms of six stocks suggests that the hybrid model performs better than the single PSO-BPNN model significantly. However, there is no methodology regarding optimal selection of the number of subcomponents of VMD. In his follow-up research [18], the newly proposed model VMD-GRNN demonstrates higher accuracy than the EMD-based forecasting models in the predictions of WTI oil prices, CANUS exchange rate and NASDAQ 100 VIX when the parameter of subcomponents number ranges from 6 to 12. The similar results are proved in [19], in which the VMD is combined with a GRNN optimized by particle swarm optimization (PSO) and the hybrid model is established to predict the California electricity and Brent crude oil prices. The performances of EMD and VMD-based models are assessed and the number of subseries of VMD is set to be the same as EMD. The above researches have confirmed the applicability and superiority of VMD in practice, but it still has some room for improvement: Firstly, the optimal number of components decomposed by VMD is still difficult to be determined, but the empirical results of literature [18] have indicated that the forecasting quality of the VMD-based models will vary with the change of component number decomposed. Secondly, the above-mentioned forecasting models are all classical ANNs, which can be replaced with the promising RNNs, such as RNN, LSTM and GRU, to further enhance the forecasting ability. Thirdly, the evaluating metrics are only limited to error measures without considering the capability of correctly predicting the moving direction of the time series, which is of great significance in the short-term prediction of financial time series data.
Combining the advantages of GRU, VMD network and other variant models, there have been several literatures utilizing the hybrid forecasting models to implement various prediction tasks. For example, Zhu et al. [20] employed a hybrid model integrating VMD and BiGRU network to forecast the daily natural rubber futures price and volatility, validing the effectiveness of this model. The result indicated that the improvements in prediction performance largely depended on the time-scale matching degree between the predicted target and the mode sub-series. Li et al. [21] introduced an error correction strategy into VMD-GRU hybrid model to enhance the model performance in wind speed interval prediction, and the experiments based on eight cases from two wind fields demonstrated the proposed model is a highly qualified forecasting method. By combining GRU with VMD, Wang et al. [22] adopted a hybrid model for addressing the wind power interval prediction problem and proposed an optimization method based on constructed intervals for building high-quality training labels before applying the Adam algorithm for full training, and the effectiveness of the VMD-GRU was confirmed in comparison with other models. However, it is worth noting that the historical elements input into the forecasting network play different roles when predicting the target value in time series. In general, the impact of the input values closer to the target value is greater than that of the farther time points. Moreover, the optimal number of components needs to be preset in VMD, which is important to improve the accuracy of the final prediction result. In this work, after decomposing the original time series into an optimal number of subseries according to a certain standard the ratio of residual energy (rres) by VMD, an attention mechanism is introduced into the GRU network to enhance the forecasting quality by assigning different weights to the input elements.
The contribution of this paper to the literature is to propose a novel hybrid model for the reliable stock price indices time series prediction, namely, London FTSE Index (FTSE) and Nasdaq Index (IXIC). The evaluations indicate that compared with the counterparts, including the single models and the traditional GRU-based models, the proposed VMD-AttGRU model presents more accurate and robust results demonstrated by the level forecasting indices. The introduction of attention mechanism in the hybrid model VMD-GRU decreases the forecasting error while slightly reduce the ability of this model to correctly predict the direction.
Variational mode decomposition (VMD) is a non-recursive and adaptive data decomposition technique developed recently [16]. VMD is utilized, in the VMD-AttGRU model, to decompose the original stock index x(t),t=1,2,…,N into n components, ci,i=1,2,…,n, which stands for different local vibrations ranging from high frequency to low frequency. Each mode ci need to compact around a center frequency ωk mostly. The bandwidth of a mode can be estimated by follows: At first, for each mode ci, the Hilbert transform is employed to calculate the correlation analysis data and a unilateral frequency spectrum is obtained. Then, for each mode ci, the spectrum of mode is transmitted to the baseband by exponential mixing with the pulses tuned to their respective centers. Afterwards, the H1 Gaussian smoothness of the demodulated series is used to calculate the bandwidth. The constraints of variational problem can be expressed in the following way:
min(ωk,uk){n∑i=1‖∂t[(δt+jπt)∗ci(t)]e−jωit‖22}s.t.n∑i=1ci=x(t) | (1) |
where {ci}={c1,c2,…cn} and {ωi}={ω1,ω2,…ωn} respectively denote the set of the ith subcomponent and its corresponding central frequency. ∂t indicates the differential processing of t, ‖·‖ indicates the norm processing, δt represents the Dirac function, and * denotes the convolution symbol.
To solve the optimization problem of constrained variational decomposition, an augmented Lagrangian function L is introduced:
L({ci},{ωi},λ)=αn∑i=1∂t[(δt+jπt)∗ci(t)]e−jωit22+‖x(t)−n∑i=1ci(t)‖22+〈λ(t),x(t)−n∑i=1ci(t)〉, | (2) |
in which α denotes the penalty parameter, and λ(t) is the Lagrangian multiplier. In order to obtain the saddle point of the above formula, which also is the solution of the original constraint conditions, VMD adopts the Alternate direction method of multipliers (ADMM) [23].
Prior to VMD, the number of components n should be properly determined in advance. If the number is large, additional computing resources will be occupied, but if n is small, it may lead to an insufficient decomposition and inaccurate forecasting results finally. The ratio of residual energy rres to original data sequence energy is used to determine the optimal number, which can be formulated as follows:
rres=1NN∑t=1|x(t)−∑ni=1ci(t)x(t)| | (3) |
where rres is the residual after decomposition, which can be used as the optimization index of VMD process. In empirical, when rres is smaller than 1% or there is no obvious trend of downwards, the component number can be defined [24].
The long short-term memory (LSTM) network [8] creatively introduces the "gate" mechanism to improve the conventional recurrent neural network (RNN): it replaces the hidden layer nodes of the RNN with special memory cells. Each memory cell contains three gates: input gate it, forget gate ft, and output gate ot that implement the filtering and processing of historical states and information, and the problems of gradient disappearance and explosion can be effectively resolved. The LSTM has been successfully applied in time series prediction [8,9]. The gated recurrent unit (GRU) network [10] integrates the three gates of the LSTM into two gates: reset gate rt and update gate zt and achieves better performance in time series forecasting tasks [25]. The reset gate measures how much the historical information will be kept at this moment and how much the latest information will be added, which helps to grasp the dependency of short-term existing in the series data, while the update gate determines the degree of "forgetting" historical information, and the information with arbitrary-lengths of the input xt can be memorized in this gate effectively. The basic steps of GRU can be shown in the following:
At first, the reset gate rt and update gate zt at the current state (time t) are established by the latest input xt and the hidden state produced by the previous cell ht−1, and the outputs of the two gates are respectively given as:
rt=σ(xtUr+ht−1Wr+br) | (4) |
zt=σ(xtUz+ht−1Wz+bz) | (5) |
Secondly, the current candidate hidden state ˜ht can be formulated:
˜ht=tanh(xtUh+(ht−1∗rt)Wh+bh) | (6) |
Finally, the outcome of current hidden state ht can be computed by implementing the linear combination of the current candidate hidden state ˜ht and the previous hidden state ht−1, where the sum of weighting coefficient is equal to 1.
ht=(1−zt)∗˜ht+zt∗ht−1 | (7) |
where Ur, Uz, Uh and Wr, Wz, Wh represent the appropriate weight coefficient matrices, br,bz and bh denote the corresponding bias vectors, σ(·) and tan(·) are the Sigmoid function and Hyperbolic tangent function respectively, and * indicates the dot multiplication between matrices.
Attention mechanism is originated from a fact that human brain focuses on only specific parts of their visual view when recognizing something [26]. For predicting time series, there is a fact that not all elements in the input series contribute equally to the value of context vector at each time step t, which is often ignored by the conventional forecasting networks. Therefore, the principle of attention mechanism built in neural network is to select crucial elements and give more weight to them, rather than taking all elements into account equally. That is, the attention mechanism is a deep learning algorithm for identifying the most relevant inputs. After ignoring the irrelevant information and amplifying the needed information, the processing efficiency of input information is greatly improved. Recently, the attention mechanism has been applied in computational neuroscience [27], text representations [28] and image description [29] successfully. Figure 1 depicts the calculation of attention value in three steps, through which different weights wi are assigned to the elements of input series to highlight the important subset of its inputs by training the model at different time. Every element of the input data set is assumed to contain an address (Key) and a value (Value). The given goal is denoted as G and the attention weight is the result to be calculated. In the figure, F (G, Key) is adopted to calculate the relevancy between the given target G and address K. Ri and wi (i = 1, 2, ..., m) represent respectively the relevance and weight of attention for the ith element of input sequence at time t. The realization of attention mechanism can be formulated as follows:
et=Attend(xt,st−1,wt−1) | (8) |
wtj=exp(etj)∑Nj=1exp(etj) | (9) |
ˆxtj=wtjxtj | (10) |
where et denotes the attention score that is defined by input data xt, previous state st−1 and weight wt−1 of previous attention.
The specific implementation process of attention mechanism utilized in this work is referred to [30]. That is, in the first step, the relevancy between every previous input elements and output elements are computed. Then, applying the softmax formula to convert the relevancies into the probability form. Lastly in the third step, multiply the obtained probabilities by the implicit expression of the corresponding input feature, to make it stand for the feature contribution to the forecasted load and sum up all the input contribution features to be the input section to forecast the next load value.
In view of the advantages of VMD, attention mechanism and GRU network, we construct a hybrid model named VMD-AttGRU by combining the three techniques. In this model, the VMD is utilized to decompose the original time series into several components. The Attention-GRU (simplified as AttGRU) is used to establish forecasting model for each component and obtain the predicted output separately, in which the GRU layer takes the output of the attention layer as the input so that the capability of conventional GRU network is improved. The final forecasting result is calculated by summarizing the separate predicted outputs obtained by AttGRU. The flow chart in Figure 2 depicts its implementation process, in which the VMD-AttGRU operation is carried out as follows.
Step 1: The VMD is utilized to decompose the stock price index series x(t),t=1,2,⋯N, into n mutually independent subseries, denoted by IMF1, IMF2, … IMFn, in which the n is determined by a specific standard. The initial series is reconstructed in terms of the IMFs as:
x(t)=n∑k=1IMFk(t) |
Step 2: Each component IMF is split into training and test datasets at a fixed ratio, and the input and output sets are split according to the step size. The AttGRU network is utilized to train and establish the forecasting model based on the training dataset. The forecasting output of each IMF is obtained.
Step 3: The final predicted result of the original stock price index series is calculated by summarizing the separate predicted outputs.
Step 4: Multiple performance measures, i.e., MAE, RMSE, MAPE, TIC, and 𝐷stat, are adopted to evaluate the prediction capacity of VMD-AttGRU from different perspectives.
In this work, the daily closing price of London FTSE Index (FTSE) and Nasdaq Index (IXIC) are used to examine the validity of the proposed VMD-AttGRU model. The selected two stock price indices are both representative in the global stock markets and regarded as important benchmarks of social and economic development. They are collected from the global important stock price indices of Wind database, which stored in the form of [date, price] time series. The FTSE cover the time period from 2007/03/09 to 2020/06/05, which accounts 3348 data points, and the IXIC cover the time period from 2007/02/20 to 2020/06/05, which also account 3348 data points. To conduct experiments, the first 80% of each sample is used to train the model, and the remaining 20% is used as test sets. Figure 3 displays the curves of price samples of FTSE and IXIC. Table 1 exhibits the details illustration of the selected two stock price indices. Table 2 shows the descriptive statistic information of the samples in terms of mean, standard deviation, skewness, kurtosis, Jarque-Bera (JB) test for normality and Augmented Dickey Fuller (ADF) test for stationarity. It is shown that with the standard deviation value of 2114.71 for IXIC and 894.81 for FTSE, the IXIC has more volatility than the FTSE. The FTSE is negatively skewed with skewness value of −0.56 while the IXIC is positively skewed with skewness of 0.67. Both of them have kurtosis less than 3, implying no leptokurtosis. The results of JB test indicate that both FTSE and IXIC price index series are distinctly non-Gaussian distributed at the 5% confidence level. The results of ADF test suggests the significantly non-stationary of both prices.
FTSE | IXIC | |
Time period | 2007/03/09 ~ 2020/06/05 | 2007/02/20 ~ 2020/06/05 |
Total number | 3348 | 3348 |
Train sets | 2007/03/09 ~ 2017/10/13 | 2007/02/20 ~ 2017/10/05 |
Train number | 2678 | 2678 |
Test sets | 2017/10/16 ~ 2020/06/05 | 2017/10/06 ~ 2020/06/05 |
Test number | 670 | 670 |
Index | Mean | Std. | Skewness | Kurtosis | JB test | ADF test |
FTSE | 6273.01 | 894.81 | -0.56 | 2.90 | 177.20* (0.00) | −0.25 (0.56) |
IXIC | 4320.50 | 2114.71 | 0.67 | 2.28 | 323.25 *(0.00) | 2.13 (0.99) |
To reduce the impact of noise and facilitate optimize the solving process, each component c(t),t=1,2,…N obtained by VMD will be normalized to the range of [0, 1] by the following maximum and minimum standardized formula:
c(t)'=c(t)−minc(t)maxc(t)−minc(t) | (11) |
Then the normalized data is input into the AttGRU network for training and prediction. In order to obtain the real predictive value and compare it with the actual value intuitively the normalized output c'(t) can be reverted to x(t) after prediction as follows:
c(t)=c'(t)(maxc(t)−minc(t))+minc(t) | (12) |
We would like to better validate the robustness of the prediction network of VMD-AttGRU, this work adopts five commonly-used criteria to examine the superiority of the model from the various perspectives. They are including the mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), Theil Inequality Coefficient (TIC) and directional statistic Dstat, in which the first four indices are employed to measure the level forecasting accuracy and the Dstat is employed to measure the correctness of predicted direction for a time series in terms of percentage. They are respectively defined as follows:
MAE=1NN∑t=1|xt−ˆxt| | (13) |
RMSE=√1NN∑t=1(xt−ˆxt)2 | (14) |
MAPE=1NN∑t=1|xt−ˆxtxt| | (15) |
TIC=√1N∑Nt=1(xt−ˆxt)2√1N∑Nt=1xt2+√1N∑Nt=1ˆxt2 | (16) |
Dstat=100%∗1NN∑t=1dtdt={1,if(ˆxt−ˆxt−1)(xt−xt−1)≥0,t>10,otherwise | (17) |
wherext expresses the actual value, ˆxt signifies the forecasting value, N is the length of sample of forecasting results, the same applies hereinafter. The MAE is used to measure the average absolute error between the actual series and the predicted series. The RMSE, which is more sensitive to outliers, is used to measure the deviation between the actual and the predicted series. The MAPE is designed to compute the average relative errors between the actual series and the predicted series in terms of percentage, while the directional statistic Dstat is adopted to evaluate the capability of correctly predicting the moving direction of the time series. In general, the smaller value of the MAE, RMSE, MAPE and TIC indicates the less difference between the forecasting and the actual values, that is, the more accuracy of the prediction of the model. The higher Dstat value corresponds to the better performance of the model.
In this section, the predictive performance of VMD-AttGRU model for stock price indices forecasting is analyzed. To comprehensively demonstrate the advantages of the proposed hybrid model and the effectiveness of the attention mechanism in stock price index prediction, single models (LSTM, GRU, AttGRU) and hybrid model VMD-GRU are considered for comparison. According to the "decomposition and ensemble" strategy, at first the prices are decomposed by VMD technique, in which the number of subseries IMFs should be determined first. Table 3 displays the ratio of residual energy rresin VMD approach under different n for the stock price indices. All rresare below 1%. In FTSE, the downward tendency of rres tends to be stable when n is larger than 15, while the descending tendency of rres tend to be stable when n is larger than 16 in IXIC. Therefore, the suitable number of components in FTSE is set 15, and that in IXIC is set 16.
n | FTSE | IXIC |
5 | 0.54% | 0.68% |
6 | 0.45% | 0.54% |
7 | 0.40% | 0.42% |
8 | 0.36% | 0.34% |
9 | 0.31% | 0.30% |
10 | 0.27% | 0.28% |
11 | 0.23% | 0.25% |
12 | 0.20% | 0.23% |
13 | 0.17% | 0.22% |
14 | 0.15% | 0.21% |
15 | 0.13% | 0.17% |
16 | 0.12% | 0.14% |
17 | 0.11% | 0.14% |
18 | 0.10% | 0.13% |
Taking FTSE as an example, Figure 4 displays the subseries obtained by VMD. They are listed ranging from high to low frequency, depicting different local oscillations embodied in the data series. It can be seen intuitively that the decomposed subseries is more regular than the original series, which helps to reduce the complexity of datasets to be forecasted. Among them, the high frequency components with relatively small values reflecting the detailed short-term volatilities information of the original price series, and the low frequency components composed of large values represent the whole changes of tendency of the daily closing prices.
Later, the corresponding AttGRU prediction model is constructed for each composed IMF subseries. In parameters setting, A historical lag of order 5 is taken to predict the data of the next period, considering there are 5 trading days per week that can be regarded as a cycle simply. In other word, the number of input data points is set to 5 and that of outputs is set to 1. After repeated experiments, a 5 × 50 × 1 neural network is obtained by setting the number of hidden nodes to 50. For convenience, set the number of epochs to 300 and the batch size to 64. It should be noted that all of the processes are implemented in Python 3.x running on a Quad-Core Intel Core i5 processor operating at 1.40 GHz with an 8 GB installed RAM.
Figure 5 shows the comparison of the actual value and forecasted value by AttGRU for each subseries in the FTSE test set. It shows that the predicted curve is very close to the real curve of each subseries, demonstrating that the AttGRU network can make an accurate prediction of components with different frequency information.
Figure 6 shows the results for VMD-AttGRU for the two stock price indices test sets along with the other considered models: LSTM, GRU, AttGRU, and VMD-GRU. Overall, for both stocks, the curves are close together, showing that the predicted curve of each model is near the real price curve. The curve for the VMD-AttGRU model is generally the closest to the actual curve, indicating the best prediction performance in this comparison. This can be further observed in the inset plots, where a certain volatile part of the datasets is magnified. So, we can conclude that the VMD-AttGRU model has the highest accuracy for stock price prediction.
In order to further analyze the performance of various models, the predictive errors are also presented in Figure 4. It can be seen that the upper and lower bounds are not much different for single models. The prediction errors of single models are evidently larger than those of the hybrid model. The median of the VMD-AttGRU model is closest to 0, and the absolute values of the upper and lower quartiles are the smallest in the comparison group. The results further show that the relative error of the target model is relatively smaller and more concentrated, illustrating the better performance of the proposed model in stock price series data.
To quantitatively measure the predictive performance of each model, the evaluation criteria MAE, RMSE, MAPE, TIC, Dstat and processing time are calculated in Tables 4 and 5, and the bar graphs are given in Figure 7. It can be observed that:
Models | MAE | RMSE | MAPE (%) | TIC | Dstat | Time (s) |
LSTM | 65.114 | 91.666 | 0.943 | 0.0063 | 50.00% | 42.136 |
GRU | 63.334 | 90.061 | 0.918 | 0.0062 | 49.70% | 45.844 |
AttGRU | 53.185 | 78.302 | 0.776 | 0.0054 | 49.10% | 48.012 |
VMD-GRU | 37.725 | 46.339 | 0.551 | 0.0032 | 98.19% | 687.125 |
VMD-AttGRU | 24.802 | 39.683 | 0.375 | 0.0027 | 98.04% | 744.332 |
Models | MAE | RMSE | MAPE (%) | TIC | Dstat | Time (s) |
LSTM | 113.998 | 155.707 | 1.423 | 0.0100 | 49.85% | 42.887 |
GRU | 91.874 | 133.213 | 1.176 | 0.0085 | 50.00% | 45.878 |
AttGRU | 87.607 | 131.814 | 1.132 | 0.0084 | 49.10% | 49.032 |
VMD-GRU | 83.503 | 107.762 | 1.012 | 0.0069 | 98.19% | 731.371 |
VMD-AttGRU | 65.925 | 94.245 | 0.858 | 0.0060 | 94.88% | 798.146 |
1) The hybrid forecasting models following the decomposition-and-ensemble strategy outperform the single models comprehensively, especially for the directional statistic Dstat, which is approximately at a level of 50% in single models but is improved by more than 40% after combining with VMD. For error-type performance measures including MAE, RMSE, MAPE, and TIC, the values of the VMD-based models are all smaller than single models, which also verifies the superior performance of the hybrid models in stock price index forecasting.
2) When introducing the attention mechanism to the GRU network, the error-type performance measures obviously decrease, indicating an improvement of forecasting accuracy. Taking MAPE for the FTSE data as an example, the MAPE of GRU is 0.918, while that of AttGRU is 0.776, reduced by 15.46%. VMD-GRU has a MAPE value of 0.551 and VMD-AttGRU has a value of 0.375, reduced by 51.57%. However, the accuracy measured by Dstat decreases for both FTSE and IXIC data after adding the attention mechanism. Specifically, the value for AttGRU and VMD-AttGRU is smaller than that for GRU and VMD-GRU respectively. Considering that the final predicted result is determined by the linear summation of predicted results of different IMFs and the forecasting quality of each IMF affects the final result largely, Figure 8 further exhibits the comparison of Dstat value of each IMF predicted by AttGRU and GRU for the VMD-based hybrid model, in which the Dstat values of different IMFs predicted by GRU are generally higher than that predicted by AttGRU for both FTSE and IXIC series. These all indicate that the introduction of the attention mechanism does not improve the prediction accuracy in terms of direction.
3) The prediction precision of the proposed VMD-AttGRU model appears to be significantly higher than other compared models except for Dstat For FTSE and IXIC data, the values of Dstat for VMD-GRU are both the largest, reaching 98.19%, while those for VMD-AttGRU are 98.04 and 94.88%, respectively, which are 0.15 and 3.31% lower than the largest predicted by VMD-GRU.
4) The processing time of hybrid models is significantly longer than that of single models, meaning that the process of establishing and training the forecasting models for each IMF takes longer time. In the comparison of AttGRU with GRU as well as VMD-AttGRU with VMD-GRU, the introduction of attention mechanism layer also leads to a longer processing time. Compared with the LSTM, the processing time of GRU is relative shorter for both FTSE and IXIC, indicating the processing speed by the gates of each hidden layer unit in GRU is slower than that in LSTM.
In brief, following the "Divide-and-Conquer" principle, on the one hand, the proposed hybrid model VMD-AttGRU can improve the forecasting accuracy in terms of error-type performance measures. On the other hand, the introduction of attention mechanism weakens the correctness of predicted direction. Moreover, the "Decomposition-and-Ensemble" framework of the forecasting model inevitably causes greater data processing, which leads to a higher time cost while improving the forecasting quality.
A hybrid model VMD-AttGRU is proposed in this study to forecast the stock price indices of FTSE and IXIC. Since the price series is non-stationary and non-linear, the VMD approach is applied to weaken the adverse effect of too much noise in prediction. Moreover, considering that not all elements in the input series contribute equally to the forecasting tasks, the attention mechanism is utilized to assign weights to different input elements for the GRU network and achieves a more accurate forecasting result. Compared with single models (LSTM, GRU, and AttGRU) and a hybrid model (VMD-GRU), the proposed VMD-AttGRU model exhibits superiority in improving forecasting accuracy of stock price indices after analyzing its performance (MAE, RMSE, MAPE, and TIC) together with trend-type performance (Dstat). The proposed VMD-AttGRU model can provide an effective paradigm for the prediction of financial time series, which could also be applied to predicting time series in other fields.
The work was partially supported by the Humanities and Social Sciences Foundation of Ministry of Education of China (No. 18YJCZH134) and the Fundamental Research Funds for the Central Universities (No. FRF-BR-18-001B).
The authors declare there is no conflict of interests.
[1] | C. Zhang, H. Pan, Y. Ma, X. Huang, Analysis of Asia Pacific stock markets with a novel multiscale model, Phys. A, 534 (2019), 120939. |
[2] | A. L. D. Loureiro, V. L. Miguéis, L. F. M. da Silva, Exploring the use of deep neural networks for sales forecasting in fashion retail, Decis. Support Syst., 114 (2018), 81-93. |
[3] | J. Wang, J. Wang, Forecasting stock market indexes using principle component analysis and stochastic time effective neural networks, Neurocomputing, 156 (2015), 68-78. doi: 10.1016/j.neucom.2014.12.084 |
[4] | Y. Xu, S. B. Cohen, Stock movement prediction from tweets and historical prices, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018. |
[5] | D.P. Mandic, J.A. Chambers, Exploiting inherent relationships in RNN architectures, Neural Networks, 12 (1999), 1341-1345. doi: 10.1016/S0893-6080(99)00076-3 |
[6] | T. Deng, X. He, Z. Zeng, Recurrent neural network for combined economic and emission dispatch, Applied Intelligence, Appl. Intell., 48 (2018), 2180-2198. |
[7] | S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997), 1735-1780. |
[8] | K. Wang, X. Qi, H. Liu, Photovoltaic power forecasting based LSTM-Convolutional Network, Energy, 189 (2019), 116225. |
[9] | Z. Karevan, J. A. K. Suykens, Transductive LSTM for time-series prediction: An application to weather forecasting, Neural Networks, 125 (2020), 1-9. |
[10] | B. Zhao, Z. P. Wang, W. J. Ji, X. Gao, X. B. Li, A Short-term Power Load Forecasting Method Based on Attention Mechanism of CNN-GRU, Power Syst. Technol., 12 (2019). |
[11] | Z. Y. Peng, S. Peng, L. D. Fu, B. C. Lu, J. J. Tang, K. Wang, et al., A novel deep learning ensemble model with data denoising for short-term wind speed forecasting, Energy Convers. Manage., 207 (2020), 112524. |
[12] | W. Y. Wu, W. L. Liao, J. Miao, G. L. Du, Using Gated Recurrent Unit Network to Forecast Short-Term Load Considering Impact of Electricity Price, Energy Procedia, 158 (2019) 3369-3374. |
[13] | J. Zhang, D. Li, Y. Hao, Z. Tan, A hybrid model using signal processing technology, econometric models and neural network for carbon spot price forecasting, J. Cleaner Prod., 204 (2018), 958-964. |
[14] | J. Wang, L. Y. Tang, Y. Y. Luo, P. Ge, A weighted EMD-based prediction model based on TOPSIS and feed forward neural network for noised time series, Knowl. Based Syst., 132 (2017), 167-178. doi: 10.1016/j.knosys.2017.06.022 |
[15] | J. Cao, Z. Li, J. Li, Financial time series forecasting model based on CEEMDAN and LSTM, Phys. A, 519 (2019), 127-139. doi: 10.1016/j.physa.2018.11.061 |
[16] | K. Dragomiretskiy, D. Zosso, Variational mode decomposition, IEEE Trans. Signal Process., 62 (2014), 531-544. |
[17] | S. Lahmiri, Intraday stock price forecasting based on variational mode decomposition, J. Comput. Sci., 12 (2016), 23-27. doi: 10.1016/j.jocs.2015.11.011 |
[18] | S. Lahmiri, A variational mode decomposition approach for analysis and forecasting of economic and financial time series, Expert Syst. Appl., 55 (2016), 268-273. doi: 10.1016/j.eswa.2016.02.025 |
[19] | S. Lahmiri, Comparing variational and empirical mode decomposition in forecasting day-ahead energy prices, IEEE Syst. J., 11 (2015), 1907-1910. |
[20] | Q. Zhu, F. Zhang, S. Liu, Y. Wu, L. Wang, A hybrid VMD-BiGRU model for rubber futures time series forecasting, Appl. Soft Comput., 84 (2019), 105739. |
[21] | C. Li, G. Tang, X. Xue, A. Saeed, X. Hu, Short-term wind speed interval prediction based on ensemble GRU model, IEEE Trans. Sustainable Energy, 11 (2020), 1370-1380. doi: 10.1109/TSTE.2019.2926147 |
[22] | R. Wang, C. Li, W. Fu, G. Tang, Deep learning method based on gated recurrent unit and variational mode decomposition for short-term wind power interval prediction, IEEE Trans. Neural Networks Learn. Syst., 31 (2019), 3814-3827. |
[23] | S. Boyd, N. Parikh, E. Chu, B. Peleato. J. Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Now Foundations and Trends, 2011. |
[24] | Y. Liu, C. Yang, K. Huang, W. Cui, Non-ferrous metals price forecasting based on variational mode decomposition and LSTM network, Knowl. Based Syst., 188 (2020), 105006. |
[25] | J. W. E, J. M. Ye, L. L. He, H. H. Jin, Energy price prediction based on independent component analysis and gated recurrent unit neural network, Energy, 189 (2019), 116278. |
[26] | S. Chen, L. Ge, Exploring the attention mechanism in LSTM-based Hong Kong stock price movement prediction, Quant. Finance, 19 (2019), 1507-1515. doi: 10.1080/14697688.2019.1622287 |
[27] | R. Desimon, J. Duncan, Neural mechanisms of selective visual attention, Annu. Rev. Neurosci., 18 (1995), 193-222. doi: 10.1146/annurev.ne.18.030195.001205 |
[28] | M. T. Luong, H. Pham, C. D. Manning, Effective approaches to attention-based neural machine translation, arXiv: 1508.04025. |
[29] | L. Li, S. Tang, Y. Zhang, L. Deng, Q. Tian, GLA: global-local attention for image description, IEEE Trans. Multimedia, 20 (2017), 726-737. |
[30] | S. Wang, X. Wang, S. Wang, D. Wang, Bi-directional long short-term memory method based on attention mechanism and rolling update for short-term load forecasting, Int. J. Electric. Power Energy Syst., 109 (2019), 470-479. doi: 10.1016/j.ijepes.2019.02.022 |
1. | Wenhua Dong, Chunna Zhao, Stock price forecasting based on Hausdorff fractional grey model with convolution and neural network, 2021, 18, 1551-0018, 3323, 10.3934/mbe.2021166 | |
2. | Samuel Asante Gyamerah, Ning Cai, Two-Stage Hybrid Machine Learning Model for High-Frequency Intraday Bitcoin Price Prediction Based on Technical Indicators, Variational Mode Decomposition, and Support Vector Regression, 2021, 2021, 1099-0526, 1, 10.1155/2021/1767708 | |
3. | Taha Buğra Çeli̇k, Özgür İcan, Elif Bulut, Extending machine learning prediction capabilities by explainable AI in financial time series prediction, 2023, 132, 15684946, 109876, 10.1016/j.asoc.2022.109876 | |
4. | Kunliang Xu, Hongli Niu, Do EEMD based decomposition-ensemble models indeed improve prediction for crude oil futures prices?, 2022, 184, 00401625, 121967, 10.1016/j.techfore.2022.121967 | |
5. | Xuerui Wang, Xiangyu Li, Shaoting Li, A novel stock indices hybrid forecasting system based on features extraction and multi-objective optimizer, 2022, 52, 0924-669X, 11784, 10.1007/s10489-021-03031-9 | |
6. | Jujie Wang, Qian Cheng, Ying Dong, An XGBoost-based multivariate deep learning framework for stock index futures price forecasting, 2022, 0368-492X, 10.1108/K-12-2021-1289 | |
7. | Kunliang Xu, Hongli Niu, Preprocessing and postprocessing strategies comparisons: case study of forecasting the carbon price in China, 2022, 1432-7643, 10.1007/s00500-022-07690-9 | |
8. | Dinggao Liu, Zhenpeng Tang, Yi Cai, A Hybrid Model for China’s Soybean Spot Price Prediction by Integrating CEEMDAN with Fuzzy Entropy Clustering and CNN-GRU-Attention, 2022, 14, 2071-1050, 15522, 10.3390/su142315522 | |
9. | Hui Ru Li, Yan Rong Hu, Hong Jiu Liu, A novel LASSO-ATT-LSTM model of stock price prediction based on multi-source heterogeneous data, 2023, 10641246, 1, 10.3233/JIFS-221919 | |
10. | Chengxin Yin, Dezhao Tang, Fang Zhang, Qichao Tang, Yang Feng, Zhen He, Sathishkumar V E, Students learning performance prediction based on feature extraction algorithm and attention-based bidirectional gated recurrent unit network, 2023, 18, 1932-6203, e0286156, 10.1371/journal.pone.0286156 | |
11. | Prem Purusottam Jena, Kaberi Das, Debahuti Mishra, Arundhati Lenka, Sashikala Mishra, Pradeep Kumar Mallick, 2024, Leveraging the Strengths of Long Short-Term Memory and Gated Recurrent Unit Architectures for Indian Stock Market Prediction, 979-8-3503-4985-6, 267, 10.1109/ESIC60604.2024.10481658 | |
12. | Yuan Yao, Zhao-yang Zhang, Yang Zhao, Stock index forecasting based on multivariate empirical mode decomposition and temporal convolutional networks, 2023, 142, 15684946, 110356, 10.1016/j.asoc.2023.110356 | |
13. | Pramit Pandit, Atish Sagar, Bikramjeet Ghose, Moumita Paul, Ozgur Kisi, Dinesh Kumar Vishwakarma, Lamjed Mansour, Krishna Kumar Yadav, Hybrid modeling approaches for agricultural commodity prices using CEEMDAN and time delay neural networks, 2024, 14, 2045-2322, 10.1038/s41598-024-74503-4 | |
14. | 如沁 吉, Stock Index Prediction Based on Quadratic Decomposition and Reconstruction, 2024, 13, 2324-8696, 4780, 10.12677/mos.2024.134432 | |
15. | Yuan Yao, Zhaoyang Zhang, Yang Zhao, Chuncheng Li, Stock Trend Forecasting Based on Multi-Information Fusion with Deep Learning, 2022, 1556-5068, 10.2139/ssrn.4183576 | |
16. | Jun Zhang, Xuedong Chen, A two-stage model for stock price prediction based on variational mode decomposition and ensemble machine learning method, 2024, 28, 1432-7643, 2385, 10.1007/s00500-023-08441-0 | |
17. | Qingyang Liu, Yanrong Hu, Hongjiu Liu, Enhanced Stock Price Prediction with Optimized Ensemble Modeling Using Multi-source Heterogeneous Data: Integrating LSTM Attention Mechanism and Multidimensional Gray Model, 2024, 2452414X, 100711, 10.1016/j.jii.2024.100711 | |
18. | Wen Fang, Shuwen Zhang, Chang Xu, Improving prediction efficiency of Chinese stock index futures intraday price by VIX-Lasso-GRU Model, 2024, 238, 09574174, 121968, 10.1016/j.eswa.2023.121968 | |
19. | Xihe Qiu, Xiaoyu Tan, Chenghao Wang, Shaotao Chen, Bin Du, Jingjing Huang, A long short-temory relation network for real-time prediction of patient-specific ventilator parameters, 2023, 20, 1551-0018, 14756, 10.3934/mbe.2023660 | |
20. | 钰华 孙, Research on Stock Prediction Based on S-V-PSAL Mixed Model, 2023, 12, 2324-7991, 3920, 10.12677/AAM.2023.129384 | |
21. | Yuanyuan Yu, Dongsheng Dai, Qu Yang, Qing Zeng, Yu Lin, Yanxiang Chen, An intelligent framework based on optimized variational mode decomposition and temporal convolutional network: Applications to stock index multi-step forecasting, 2024, 09574174, 126222, 10.1016/j.eswa.2024.126222 |
FTSE | IXIC | |
Time period | 2007/03/09 ~ 2020/06/05 | 2007/02/20 ~ 2020/06/05 |
Total number | 3348 | 3348 |
Train sets | 2007/03/09 ~ 2017/10/13 | 2007/02/20 ~ 2017/10/05 |
Train number | 2678 | 2678 |
Test sets | 2017/10/16 ~ 2020/06/05 | 2017/10/06 ~ 2020/06/05 |
Test number | 670 | 670 |
Index | Mean | Std. | Skewness | Kurtosis | JB test | ADF test |
FTSE | 6273.01 | 894.81 | -0.56 | 2.90 | 177.20* (0.00) | −0.25 (0.56) |
IXIC | 4320.50 | 2114.71 | 0.67 | 2.28 | 323.25 *(0.00) | 2.13 (0.99) |
n | FTSE | IXIC |
5 | 0.54% | 0.68% |
6 | 0.45% | 0.54% |
7 | 0.40% | 0.42% |
8 | 0.36% | 0.34% |
9 | 0.31% | 0.30% |
10 | 0.27% | 0.28% |
11 | 0.23% | 0.25% |
12 | 0.20% | 0.23% |
13 | 0.17% | 0.22% |
14 | 0.15% | 0.21% |
15 | 0.13% | 0.17% |
16 | 0.12% | 0.14% |
17 | 0.11% | 0.14% |
18 | 0.10% | 0.13% |
Models | MAE | RMSE | MAPE (%) | TIC | Dstat | Time (s) |
LSTM | 65.114 | 91.666 | 0.943 | 0.0063 | 50.00% | 42.136 |
GRU | 63.334 | 90.061 | 0.918 | 0.0062 | 49.70% | 45.844 |
AttGRU | 53.185 | 78.302 | 0.776 | 0.0054 | 49.10% | 48.012 |
VMD-GRU | 37.725 | 46.339 | 0.551 | 0.0032 | 98.19% | 687.125 |
VMD-AttGRU | 24.802 | 39.683 | 0.375 | 0.0027 | 98.04% | 744.332 |
Models | MAE | RMSE | MAPE (%) | TIC | Dstat | Time (s) |
LSTM | 113.998 | 155.707 | 1.423 | 0.0100 | 49.85% | 42.887 |
GRU | 91.874 | 133.213 | 1.176 | 0.0085 | 50.00% | 45.878 |
AttGRU | 87.607 | 131.814 | 1.132 | 0.0084 | 49.10% | 49.032 |
VMD-GRU | 83.503 | 107.762 | 1.012 | 0.0069 | 98.19% | 731.371 |
VMD-AttGRU | 65.925 | 94.245 | 0.858 | 0.0060 | 94.88% | 798.146 |
FTSE | IXIC | |
Time period | 2007/03/09 ~ 2020/06/05 | 2007/02/20 ~ 2020/06/05 |
Total number | 3348 | 3348 |
Train sets | 2007/03/09 ~ 2017/10/13 | 2007/02/20 ~ 2017/10/05 |
Train number | 2678 | 2678 |
Test sets | 2017/10/16 ~ 2020/06/05 | 2017/10/06 ~ 2020/06/05 |
Test number | 670 | 670 |
Index | Mean | Std. | Skewness | Kurtosis | JB test | ADF test |
FTSE | 6273.01 | 894.81 | -0.56 | 2.90 | 177.20* (0.00) | −0.25 (0.56) |
IXIC | 4320.50 | 2114.71 | 0.67 | 2.28 | 323.25 *(0.00) | 2.13 (0.99) |
n | FTSE | IXIC |
5 | 0.54% | 0.68% |
6 | 0.45% | 0.54% |
7 | 0.40% | 0.42% |
8 | 0.36% | 0.34% |
9 | 0.31% | 0.30% |
10 | 0.27% | 0.28% |
11 | 0.23% | 0.25% |
12 | 0.20% | 0.23% |
13 | 0.17% | 0.22% |
14 | 0.15% | 0.21% |
15 | 0.13% | 0.17% |
16 | 0.12% | 0.14% |
17 | 0.11% | 0.14% |
18 | 0.10% | 0.13% |
Models | MAE | RMSE | MAPE (%) | TIC | Dstat | Time (s) |
LSTM | 65.114 | 91.666 | 0.943 | 0.0063 | 50.00% | 42.136 |
GRU | 63.334 | 90.061 | 0.918 | 0.0062 | 49.70% | 45.844 |
AttGRU | 53.185 | 78.302 | 0.776 | 0.0054 | 49.10% | 48.012 |
VMD-GRU | 37.725 | 46.339 | 0.551 | 0.0032 | 98.19% | 687.125 |
VMD-AttGRU | 24.802 | 39.683 | 0.375 | 0.0027 | 98.04% | 744.332 |
Models | MAE | RMSE | MAPE (%) | TIC | Dstat | Time (s) |
LSTM | 113.998 | 155.707 | 1.423 | 0.0100 | 49.85% | 42.887 |
GRU | 91.874 | 133.213 | 1.176 | 0.0085 | 50.00% | 45.878 |
AttGRU | 87.607 | 131.814 | 1.132 | 0.0084 | 49.10% | 49.032 |
VMD-GRU | 83.503 | 107.762 | 1.012 | 0.0069 | 98.19% | 731.371 |
VMD-AttGRU | 65.925 | 94.245 | 0.858 | 0.0060 | 94.88% | 798.146 |