
In this study we investigate alternative interpretable machine learning ("IML") models in the context of probability of default ("PD") modeling for the large corporate asset class. IML models have become increasingly prominent in highly regulated industries where there are concerns over the unintended consequences of deploying black box models that may be deemed conceptually unsound. In the context of banking and in wholesale portfolios, there are challenges around using models where the outcomes may not be explainable, both in terms of the business use case as well as meeting model validation standards. We compare various IML models (deep neural networks and explainable boosting machines), including standard approaches such as logistic regression, using a long and robust history of corporate borrowers. We find that there are material differences between the approaches in terms of dimensions such as model predictive performance and the importance or robustness of risk factors in driving outcomes, including conflicting conclusions depending upon the IML model and the benchmarking measure considered. These findings call into question the value of the modest pickup in performance with the IML models relative to a more traditional technique, especially if these models are to be applied in contexts that must meet supervisory and model validation standards.
Citation: Michael Jacobs, Jr. Benchmarking alternative interpretable machine learning models for corporate probability of default[J]. Data Science in Finance and Economics, 2024, 4(1): 1-52. doi: 10.3934/DSFE.2024001
[1] | Lihe Liang, Jinying Cui, Juanjuan Zhao, Yan Qiang, Qianqian Yang . Ultra-short-term forecasting model of power load based on fusion of power spectral density and Morlet wavelet. Mathematical Biosciences and Engineering, 2024, 21(2): 3391-3421. doi: 10.3934/mbe.2024150 |
[2] | Xiaoqiang Dai, Kuicheng Sheng, Fangzhou Shu . Ship power load forecasting based on PSO-SVM. Mathematical Biosciences and Engineering, 2022, 19(5): 4547-4567. doi: 10.3934/mbe.2022210 |
[3] | Faisal Mehmood Butt, Lal Hussain, Anzar Mahmood, Kashif Javed Lone . Artificial Intelligence based accurately load forecasting system to forecast short and medium-term load demands. Mathematical Biosciences and Engineering, 2021, 18(1): 400-425. doi: 10.3934/mbe.2021022 |
[4] | Mingju Chen, Fuhong Qiu, Xingzhong Xiong, Zhengwei Chang, Yang Wei, Jie Wu . BILSTM-SimAM: An improved algorithm for short-term electric load forecasting based on multi-feature. Mathematical Biosciences and Engineering, 2024, 21(2): 2323-2343. doi: 10.3934/mbe.2024102 |
[5] | Yongquan Zhou, Yanbiao Niu, Qifang Luo, Ming Jiang . Teaching learning-based whale optimization algorithm for multi-layer perceptron neural network training. Mathematical Biosciences and Engineering, 2020, 17(5): 5987-6025. doi: 10.3934/mbe.2020319 |
[6] | Yanmei Jiang, Mingsheng Liu, Jianhua Li, Jingyi Zhang . Reinforced MCTS for non-intrusive online load identification based on cognitive green computing in smart grid. Mathematical Biosciences and Engineering, 2022, 19(11): 11595-11627. doi: 10.3934/mbe.2022540 |
[7] | Fengyong Li, Meng Sun . EMLP: short-term gas load forecasting based on ensemble multilayer perceptron with adaptive weight correction. Mathematical Biosciences and Engineering, 2021, 18(2): 1590-1608. doi: 10.3934/mbe.2021082 |
[8] | Chun Li, Ying Chen, Zhijin Zhao . Frequency hopping signal detection based on optimized generalized S transform and ResNet. Mathematical Biosciences and Engineering, 2023, 20(7): 12843-12863. doi: 10.3934/mbe.2023573 |
[9] | Hao Yuan, Qiang Chen, Hongbing Li, Die Zeng, Tianwen Wu, Yuning Wang, Wei Zhang . Improved beluga whale optimization algorithm based cluster routing in wireless sensor networks. Mathematical Biosciences and Engineering, 2024, 21(3): 4587-4625. doi: 10.3934/mbe.2024202 |
[10] | Chongyi Tian, Longlong Lin, Yi Yan, Ruiqi Wang, Fan Wang, Qingqing Chi . Photovoltaic power prediction based on dilated causal convolutional network and stacked LSTM. Mathematical Biosciences and Engineering, 2024, 21(1): 1167-1185. doi: 10.3934/mbe.2024049 |
In this study we investigate alternative interpretable machine learning ("IML") models in the context of probability of default ("PD") modeling for the large corporate asset class. IML models have become increasingly prominent in highly regulated industries where there are concerns over the unintended consequences of deploying black box models that may be deemed conceptually unsound. In the context of banking and in wholesale portfolios, there are challenges around using models where the outcomes may not be explainable, both in terms of the business use case as well as meeting model validation standards. We compare various IML models (deep neural networks and explainable boosting machines), including standard approaches such as logistic regression, using a long and robust history of corporate borrowers. We find that there are material differences between the approaches in terms of dimensions such as model predictive performance and the importance or robustness of risk factors in driving outcomes, including conflicting conclusions depending upon the IML model and the benchmarking measure considered. These findings call into question the value of the modest pickup in performance with the IML models relative to a more traditional technique, especially if these models are to be applied in contexts that must meet supervisory and model validation standards.
Power load forecasting can be divided into long-term forecasting, medium-term forecasting, short-term forecasting and ultra-short-term forecasting according to the forecasting time-scale. The forecasting period of short-term power load is typical, as it is a critical basis for maintaining the stable operation of the power system and improving economic benefits. The accuracy of the short-term power forecast can play an important role in addressing the issue of the power decision department controlling power dispatch in the next step. Accurate short-term load forecasting can effectively reduce resource waste and improve economic benefits [1,2,3].
At present, load prediction methods primarily include a statistical prediction method composed of multiple linear regression [4], a Kalman filter [5,6] an autoregressive moving average and a machine learning method composed of a support vector machine [7,8,9], an expert system and artificial neural networks [10]. Research has consistently shown that the calculation model of the statistical method is too ordinary, as it can only deal with linear data but cannot grasp the inherent characteristics of nonlinear data reasonably. Although the machine learning method can deal with nonlinear data well, it cannot extract time-series data features effectively. With the development of deep learning, it becomes the focus of load forecasting. A large number of deep neural networks are widely employed in load prediction, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs) [11] and short and long duration memory networks (LSTM) [12]. CNNs can effectively extract multidimensional data features, but it cannot deal with time-series features efficiently. RNNs can model long time-series data through a cyclic structure, but with the increase of load data, there are problems such as gradient disappearance or gradient explosion. As a special RNN, the LSTM network can better solve the deficiency of the RNN through the use of a gate structure. Nonetheless, with the increase in training data, it is difficult to select parameters for the LSTM network [13]. In order to effectively process multidimensional power load data, the CNN-LSTM hybrid neural network prediction method was proposed in the literature [14]. Feature extraction was carried out through a two-dimensional convolutional layer to reduce the training difficulty of the LSTM network model. Surveys such as that conducted by the authors of [15] showed that using the CNN to extract data features, using the gated recurrent neural (GRU) network to avoid the problem of multiple training parameters in the LSTM network and introducing an attention mechanism, can effectively improve the accuracy of power load forecasting. Reference [16] found that their CNN-BiGRU network improves data utilization in order to make data flow bidirectional in the network layer. According to the research, since CNN networks cannot predict time series data well, time series convolutional networks (TCNs) can be employed for sequence data prediction. And TCN can extract time series data features better than CNN and RNN [17]. Tian et al. [18] proposed a short-term wind speed prediction model employing empirical modal decomposition and an improved sparrow algorithm to optimize the LSTM neural network. The model decomposes the ultra-short-term wind speed by utilizing empirical modal decomposition, predicts it by employing the LSTM network and optimizes the LSTM network hyperparameters by improving the sparrow optimization algorithm. In [19] for short-term wind speed prediction, a prediction model based on local mean decomposition (LMD) with a combined kernel function least squares support vector machine (LSSVM) is proposed. Wind speed data are decomposed by the LMD algorithm and predicted by the LSSVM, and the firefly algorithm is employed to optimize the parameter selection. The authors of [20] proposed a time-series convolutional network with the multi-attentional mechanism. By introducing an initial structure into the TCN network, multidimensional information was extracted from convolutional kernels of disagreement scales, improving the accuracy of ultra-short-term load prediction effectively. The authors of [21] proposed a combined prediction model based on empirical modal decomposition to forecast traffic flow state information. The empirical modal decomposition is decomposed into components, the optimal prediction method is selected based on the results of adaptive analysis and the combined model weights are optimized by employing the fruit fly algorithm. The authors of [22] proposed a combined prediction model based on ensemble empirical modal decomposition and a regularized limit learning machine for wind speed prediction. The wind speed series of the ensemble empirical modal decomposition is predicted by employing the regularized limit learning machine, and the reliability of the prediction model is improved by cross-validation. Recent evidence suggests that complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and variational model decomposition (VMD) can be employed to decompose power load data. Second, non-stationary components and stationary components are predicted by deep bidirectional long short-term memory (DBILSTM) and mixed logistic regression (MLR) networks respectively. Finally, the prediction accuracy is improved by combining the prediction structure with data reconstruction [23].
This paper presents a multi-model fusion method for short-term power load forecasting based on VMD, improved whale optimization (WOA), Wavelet temporal convolutional network (WTCN)-BiGRU and CatBoost methods. First, VMD was employed to decompose power load data into contrast intrinsic mode functions(IMFs), and weather characteristic factors were added for each intrinsic mode. Then, the TCN was utilized to extract multidimensional data features, and the extracted features were sent to the BiGRU network for model training. The influence degree of necessary information can be effectively retained by adding an attention mechanism. The model adopts the improved WOA (IWOA) algorithm to optimize the hyperparameter selection of the TCN-BiGRU-attention network, which designs the parameter selection of the network layer of the model more effectively and predicts the stationary component of the sequence in parallel with the CatBoost network. The parallel prediction results of two-layer networks were combined with the mean absolute percentage error-reciprocal weight (MAPE-RW) algorithm. The model accuracy was verified by utilizing commenced loading data from an Australian region, and the model performance was evaluated using the root mean square error (RMSE) and mean absolute percentage error (MAPE). The contributions of this paper can be summarized as follows:
1) A multi-model fusion load forecasting method is proposed. First, the IWOA-WTCN-BiGRU-attention prediction model of VMD was constructed as Model one. Second, the CatBoost prediction model based on a random search algorithm was adopted as Model two. Finally, the MAPE-RW algorithm has been utilized to fuse the load prediction results of the two models to achieve an accurate prediction of the power load value.
2) The Morlet wavelet function is used to improve the TCN. The Morlet wavelet basis function is introduced as residual block activation function of the TCN.
3) An improved WOA is proposed. The traditional WOA is improved by introducing a nonlinear convergence factor and adaptive weight, which improves the convergence speed and the convergence accuracy of the algorithm.
The paper has been organized in the following way. In Section 2, the network principles employed in the multi-model structure are introduced. In Section 3, the load prediction model structure of the multi-model fusion network is proposed. In Section 4, the experimental validation of the multi-model fusion network is performed and the experimental results are analyzed. Finally, the whole paper is summarized and future work to be carried out is presented.
As mentioned in the literature [24], in 2014, Konstantin et al. proposed a VMD method for modal decomposition, which is an adaptive and completely non-recursive modal decomposition processing method. VMD has a more solid theoretical basis and can better suppress mode aliasing by controlling bandwidth. The decomposition method is suitable for non-stationary sequence data and can decompose the data set into multiple stationary sub-sequences with different frequency scales. The VMD solution procedure is as follows.
Structural constraint variational optimal problem is
{min{uk},{wk}{k∑k=1‖∂t[(d(t)+jpt)guk(t)]e−jwkt‖22}s.t.K∑k=1uk(t)=S | (2.1) |
where {uk},{ωk} denote the corresponding modal set and center frequency after VMD decomposition respectively, and K is the number of IMFs.
The penalty factor α and Lagrange operator λ are introduced into the constrained variational problem to transform it into the following unconstrained variational problem:
L(uk,ωk,λ)=αK∑k=1‖∂t[(δ(t)+jπt)⋅uk(t)]e−jωkt‖22+‖f(t)−K∑k=1uk(t)‖22+⟨λ(t),f(t)−K∑k=1uk(t)⟩ | (2.2) |
The above unconstrained variational problems are solved by the alternating direction multiplier method, and the solving process is as follows:
K∑k=1‖ˆun+1k−ˆunk‖22/K∑k=1‖ˆun+1k−ˆunk‖22‖ˆunk‖22‖ˆunk‖22<γ | (2.3) |
where γ is the allowable error, n is the number of iterations and the Fourier transform of ˆun+1k,f(t),λ(t) is ˆun+1k(ω),ˆf(ω),ˆλn(ω).
The TCN was first proposed by Bai et al. [25]. in 2018 and is mainly employed for timing prediction, probability prediction, time prediction and traffic prediction. The TCN evolved from CNN results and can extract load data features effectively. In this paper, a multi-model fusion network is introduced, and a TCN is employed to extract data feature information from time series to remove invalid features to improve the accuracy of power load prediction. The TCN is composed of causal convolution, expansion convolution and a residual block [26].
Causal convolution adopts the one-dimensional full convolutional network framework. The zero-fill module is introduced into the network so that the input layer, hidden layer and output layer can keep the same length, to avoid the loss of effective information. The input yt is related only to the input (xt−1,xt−2,⋯xt−n) before the current input xt and t. The convolutional calculation is shown in Figure 1.
Expansion convolution can increase the receptive field size of the output unit without increasing the number of parameters. The convolutional calculation method is as follows:
F(s)=(x⊗fd)(s)=k−1∑i=0f(i)⋅xt−d⋅i | (2.4) |
where fd is the expansion rate, d corresponds to the filter and xt−d⋅i is the input at the current time and the historical time.
The core idea of residual block is to introduce one or more layers of "hop connection" operation, and the network structure is shown in Figure 2. The left channel introduces weighted normalized accelerated gradient descent and a nonlinear activation function. The right channel is the convolution directly connected to the edge, which ensures that the input and output data dimensions are consistent. The residual block output is
h(x)=Activation(x+F(x)) | (2.5) |
where x, h(x) is the input and output of the residual block. The network output h(x) is the result of linear transformation and activation function mapping.The WTCN is based on TCN topology, and the Morlet wavelet basis function is introduced into the residual block as its activation function. The Morlet wavelet basis function is expressed as
y=cos(1.75x)e−x2/−x222 | (2.6) |
The GRU network is simpler than the LSTM network with two gating units. The network inherits the advantages of the LSTM network and improves the training speed on the premise of ensuring training accuracy [27]. The GRU network structure is shown in Figure 3. By changing the GRU network into a bidirectional GRU network, information can be transmitted bidirectionally in the network layer, and the prediction accuracy of the network model is effectively improved [28]. The network structure is shown in Figure 4.
The BiGRU network calculation formula is
{zt=σ(Wz⋅[ht−1,xt])rt=σ(Wr⋅[ht−1,xt])˜ht=tanh(W˜h⋅[rt∗ht−1,xt])ht=(1−zt)∗ht−1+zt∗˜ht | (2.7) |
{ht=GRU(xt,ht−1)hi=GRU(xt,ht−1)ht=wtht+vthi+bt | (2.8) |
where zt is the update gate and rt is the reset gate, both of which are jointly determined by the input xt, hidden layer output xt at the previous moment and activation function σ. ht is the hidden layer output. Wz,Wr and W˜h are all trainable parameter matrices. ht is the state of the forward hidden layer, hi is the state of the backward hidden layer and bt is the bias optimization parameter of the hidden layer at the current time.
The attention mechanism is an intuitive interpretation method that imitates human visual mechanisms. It is often exploited in deep learning tasks such as natural language processing, image analysis and load prediction. The human visual mechanism will pay attention to the critical information of the object deliberately and ignore the irrelevant information. Consequently, it has been found that the relevant time-series information can be effectively preserved by adding an attention mechanism and weight allocation principle in the network model [29]. The structure of attention is shown in Figure 5.
The WOA is a bionic meta-heuristic optimization algorithm proposed by Australian scholars Mirjalili and Lewis in recent years based on the predation behavior of model humpback whales. The algorithm highlights the local search behavior of the network model by imitating the whale hunting behavior and realizes the global search of the network through the random search strategy. The WOA has the advantages of faster speed and higher precision in solving model parameter optimization, so it has wide application prospects. Nevertheless, the increase of power load data and influencing factors may cause the traditional WOA to have some limitations in the coordination of global search and local mining. Among them, the convergence factor a of the WOA cannot reflect the optimization process well with the linear decrease. Therefore, the nonlinear convergence factor a is proposed:
a=2−2sin(utmax_iterπ+φ) | (2.9) |
where u and φ are the set parameters and u=2 and φ=0 represent. max_iter is the maximum number of iterations. When the value is large at the initial stage of training, the searching range of optimal parameters can be effectively increased by the slowly decreasing convergence factor. With the increase in the number of iterations, the reduction speed of the convergence factor gradually increases and the convergence speed accelerates.
The introduction of the nonlinear factor a can improve the performance of the algorithm. However, in the traditional WOA, the whale motion position vector is not effectively utilized, so the population flexibility will be reduced and the optimization result will be affected. In this paper, the adaptive weight ω is introduced to enhance the global search capability of the algorithm and increase the total group diversity of the WOA. The formula for calculating ω is as follows:
ω=0.2cos(π2(1−tmax_iter)) | (2.10) |
In this paper, the performance of the IWOA is verified by introducing the benchmark test function f(x)=∑ni=1[x2i−10cos(2πxi)+10]. The number of iterations of the algorithm was set to 500 and the dimension of the base test function is 30. To ensure the reliability of the optimization results, the average of 10 experimental results is employed to indicate its average level. The WOA, improved nonlinear convergence factor (NWOA) and IWOA with adaptive weights and a nonlinear convergence factor are compared for algorithm performance. The experimental results are shown in Figure 6. It is known that the WOA with improved adaptive weights and a nonlinear convergence factor (IWOA) not only improves the convergence speed of the algorithm, but also improves the convergence accuracy. The flow chart of the improved WOA is shown in Figure 7.
CatBoost is a machine learning library that the Russian search giant Yandex opened source in 2017, and it is an improvement on the gradient boosting decision tree (GBDT) algorithm [30]. The CatBoost algorithm has fewer parameters than the GBDT algorithm. The algorithm effectively solves the problems of gradient deviation and prediction deviation, reduces the risk of model overfitting and improves the generalization ability of the algorithm. CatBoost algorithms are often used in data mining and load forecasting.
Noise and low-frequency data interference can be effectively reduced by adding prior distribution terms to the gradient decision tree. Its algorithm is as follows:
ˆxik=p−1∑j[xσj,k=xσi,k]⋅Yσj+a⋅pp−1∑j[xσj,k=xσi,k]+a | (2.11) |
where σ represents the weight coefficient of the prior term, and p represents the prior term. The CatBoost usually captures the mean of the data set as the first item when solving regression problems.
The MAPE-RW algorithm can fuse disagreement models according to the degree of prediction error and output the optimal prediction results. The proportion of the predicted value of each model was determined by finding the optimal weight. The final predicted value calculated by the algorithm is as follows:
{ωi=MjMi+Mjffinal=ωVTWBAfCatboost+ωCatboostfVTWBA | (2.12) |
where ωi is the corresponding model weight, and ffinal is the final prediction output of multi-model fusion. fCatboost and fVTWBA are the predicted outputs of the CatBoost model and VMD-decomposed WTCN-IWOA-BiGRU-attention model, respectively.
There is a lot of power load influencing factors in time-series data, so the traditional prediction model cannot extract the data feature law effectively. In this paper, a multi-model fusion short-term power load forecasting model is proposed by combining a deep learning algorithm and a machine learning algorithm. Combined with the advantages of different algorithms, the characteristic information between data can be effectively mined to improve the accuracy of load prediction. The prediction model design of the multi-model fusion network is shown in Figure 8.
1) Data processing. The model validation analysis is carried out by employing a public power load data set from 2006 to 2010 in an Australian region. This data set contains six-dimensional feature vectors, and the feature parameters are shown in Table 1 below. The data set captures 30 min as the sampling point, and load prediction is carried out by using a sliding window, with a sliding window size of 10 and sliding step size of 1. Therefore, 10 sets of historical data are employed to predict the electric load value at the next moment. The data set is divided into the training set, verification set and test set according to 3:1:1. The multi-fusion network model employs the verification set to conduct parameter tuning in the training process. In order to make the network model evaluation more accurate, the test set will not participate in the network model training.
Characteristic parameters | Parameter types | Character descriptions |
Data | Time | Samples were taken every 30 minutes |
Weather factors | Dew-point humidity | Equilibrium temperature |
Dry-bulb humidity | Aerothermodynamic temperature | |
Wet-bulb temperature | Thermodynamic saturation temperature | |
Humidity | Degree of atmospheric dryness | |
Economic factors | Degree | Price per kWh |
2) Input layer. Characteristic data and power load data are exploited as input for the prediction model. The input data with length n is filled with missing values and normalized into the prediction model.
3) VMD layer. The power load data are employed as the input of the prediction model. The long time-series data were input into the prediction Model one after missing-value filling and normalization. In the VMD, the values of k and alpha are determined by the central frequencies in the decomposition. The value of the central frequency is calculated by changing the values of k and alpha. By choosing a reasonable value of k, the phenomenon of model mixing can be avoided, and fewer network parameters for the WOA-based model one can also be generated. As can be seen from Table 2, at k = 5 and alpha = 1850, the central frequency has been relatively stable with the least number of decomposition layers, which makes further training produce fewer parameters and improves the model training speed. The penalty factor of decomposition of the variational model alpha=1850, the tolerance difference of collection tol=1e−7 and the number of decomposition modes k=5 are set. The decomposition of each mode is shown in Figure 9.
k | Simulated signal center frequency | |||||
2 | 0.0001 | 0.02243 | ||||
3 | 0.0001 | 0.02080 | 0.04274 | |||
4 | 0.0001 | 0.02076 | 0.04167 | 0.06471 | ||
5 | 0.0001 | 0.02076 | 0.04162 | 0.06469 | 0.34702 | |
6 | 0.0001 | 0.02076 | 0.04162 | 0.06238 | 0.08467 | 0.35872 |
4) IWOA-based hyperparameter optimization. The optimal hyperparameters are obtained by employing WOA, and in order to perform the optimization search within the range of valid parameter selections, the range of network parameter values selected has been defined as shown in Table 3. The components and weather characteristics generated from the decomposition of load information by VMD are the input of the WTCN-BiGRU-attention network, respectively. And, the IWOA is employed to optimize the network hyperparameters. The optimal network hyperparameter search structure of each component is shown in Table 4.
Parameters | Parameter values | |
Parameter settings | Population size | 5 |
Max iterations | 5 | |
Constant b | 2 | |
Search for upper and lower bounds | Learn rate | [0.001, 0.01] |
Epoch | [10,100] | |
Batchsize | [16,128] | |
BiGRU node number | [1,20] | |
Number of nodes at the full | [1,100] |
Modal tags | IMF1 | IMF2 | IMF3 | IMF4 | IMF5 |
Learn rate | 0.0047 | 0.0027 | 0.0064 | 0.0054 | 0.0024 |
Epoch | 38 | 26 | 61 | 70 | 89 |
Batchsize | 72 | 19 | 72 | 37 | 22 |
BiGRU node number | 8 | 18 | 8 | 17 | 2 |
BiGRU node number | 16 | 11 | 7 | 17 | 19 |
Number of nodes at the full | 49 | 4 | 46 | 14 | 97 |
5) WTCN layer. The influential factors of the load characteristics were added to the modes decomposed by the VMD layer, respectively. The Morlet wavelet function is used as a residual block activation function. The network extracts load characteristics and influencing factors through the WTCN layer. It normalizes the weight of the convolutional kernel. The dropout coefficient can be set to 0.2 to prevent over-fitting of the model. We set the expansion coefficient as (1, 2, 4, 8, 16, 32). We set the number of filters to 128.
6) BiGRU layer. The model builds two BiGRU layers to learn the features extracted from the WTCN, design full utilization of the data features and capture its internal change rules.
7) Attention layer. The input of the attention mechanism is the output data activated by the two-layer BiGRU network. The corresponding proportions of disagreement feature vectors are calculated according to the weight allocation principle, and the optimal weight parameter matrix is searched by using continuous updates and iteration.
8) CatBoost prediction model. A random search algorithm is employed to select the CatBoost network hyperparameters. The optimal network hyperparameters are shown in Table 5. The input power load and weather characteristic factors are modeled.
Verbose | Learning rate | Iterations | Depth |
50 | 0.03 | 900 | 12 |
9) Output layer. The IWOA-WTCN-BiGRU-attention network is set as Model one, and the CatBoost network is set as Model two. The MAPE-RW algorithm was exploited to calculate the weight of the output results of Model one and Model two. Finally, the load prediction output of the multi-model fusion network is obtained by effective fusion of the model prediction results.
Adam's optimization algorithm was selected as the parameter optimization method of network Model one. Adam is a first-order optimization algorithm that can effectively replace the traditional gradient descent process. The algorithm can update and iterate the weight of the network according to the data so that the loss function can be optimized. The loss function of the model is calculated by employing the mean square error, and its formula is
MSE=1NN∑i=1(yi−ˆyi)2 | (3.1) |
where N is the number of samples; yi and ˆyi are the actual load value and predicted load value of model i, respectively.
The minimum-maximum normalization method is exploited to normalize the original data and increase the training speed of the model. The inverse normalization of the predicted data designs the comparison between the predicted value and the real value more intuitive. Its calculation formula is
xn=x−xminxmax−xmin | (4.1) |
where x is the original load data. xmax and xmin are, respectively, the maximum value and minimum value of the sample data. xn is the normalized data.
The RMSE, MAPE, mean absolute error (MAE) and R-square were utilized as evaluation indexes. The calculation formulas are as follows:
{RMSE=√1NN∑i=1(˜xi−xi)2MAPE=100NN∑i=1|˜xi−xiP0|MAE=1nn∑i=1|˜xi−xi|R−square=1−n∑i=1(˜xi−xi)2n∑i=1(˜xi−ˉxi)2 | (4.2) |
where N is the number of samples. ˜xi is the true value of the sample point i. xi is the predicted value of the ith sample point.
The prediction results of the proposed model were compared with those of the traditional single model or mixed deep learning models such as GRU, LSTM, TCN, WTCN, WTCN-GRU, WTCN-LSTM, TCN-BiGRU, WTCN-BiGRU and TCN-BiGRU-attention models. The results of the load forecast data for December 20, 2010, are plotted to show more visually the accuracy advantages of the power load forecast model proposed in this paper. The load forecasting curves are shown in Figure 10. Different curves represent the prediction results and trends of disagreement prediction models. As can be seen from the prediction trends of the disagreement models shown in Figure 10, the prediction results of the prediction model proposed in this paper are more accurate, stable and closer to the real load. Table 6 shows the test lumped prediction and evaluation indexes of each model.
Prediction model | RMSE | MAPE | MAE | R-square |
IWOA-WTCN-BiGRU-attention | 110.964 | 0.914 | 82.189 | 0.993 |
CatBoost | 89.089 | 0.727 | 63.407 | 0.996 |
Fusion model of this paper | 77.495 | 0.632 | 56.103 | 0.997 |
WTCN-BiGRU-attention | 103.452 | 0.86 | 76.179 | 0.994 |
WTCN-BiGRU | 97.573 | 0.803 | 69.903 | 0.995 |
WTCN-GRU | 96.584 | 0.814 | 70.868 | 0.995 |
WTCN-LSTM | 100.309 | 0.843 | 73.649 | 0.995 |
WTCN | 103.558 | 0.869 | 75.975 | 0.994 |
TCN | 103.969 | 0.89 | 78.024 | 0.994 |
LSTM | 147.665 | 1.261 | 108.342 | 0.988 |
GRU | 109.991 | 0.932 | 82.114 | 0.994 |
CNN | 337.913 | 3.241 | 275.52 | 0.939 |
Therefore, the forecasting effects of different models were evaluated by analyzing the effects of the forecasting data of different models every month. Table 7 shows the error values of the monthly forecasting results of the different models. The analysis of the model evaluation metrics shows that the smaller the error values of MAPE, MAE and RMSE, the better the forecasting performance of the models. The larger the R-square value, the closer the predicted value is to the real value. After analyzing the data in Table 6 and Figure 10, the following conclusions can be drawn:
RMSE | Model one | Model two | Fusion model | WT-BG-a | WT-BG | WT-G | WT-L | WT |
Jan. | 108.372 | 106.039 | 97.048 | 126.77 | 116.382 | 96.321 | 107.057 | 121.456 |
Feb. | 102.257 | 84.791 | 76.02 | 108.625 | 89.625 | 90.072 | 88.652 | 103.947 |
Mar. | 85.485 | 72.585 | 61.48 | 84.297 | 76.175 | 80.666 | 85.047 | 86.109 |
Apr. | 96.362 | 106.033 | 83.271 | 105.568 | 110.002 | 108.207 | 109.236 | 113.008 |
May | 121.924 | 95.097 | 80 | 101.731 | 101.456 | 101.066 | 100.091 | 108.432 |
Jun. | 141.496 | 86.97 | 82.599 | 100.258 | 91.622 | 93.661 | 98.118 | 104.279 |
Jul. | 131.409 | 86.787 | 77.983 | 101.809 | 87.982 | 92.821 | 95.962 | 100.363 |
Aug. | 170.315 | 84.826 | 102.867 | 105.915 | 103.802 | 105.615 | 112.917 | 107.68 |
Sept. | 127.524 | 83.933 | 81.581 | 104.565 | 99.724 | 99.005 | 101.172 | 100.071 |
Oct. | 85.224 | 89.977 | 65.933 | 95.965 | 93.855 | 99.785 | 101.756 | 97.116 |
Nov. | 79.742 | 84.105 | 62.097 | 99.21 | 95.347 | 101.174 | 106.564 | 100.065 |
Dec. | 79.927 | 82.657 | 63.727 | 103.142 | 101.035 | 92.024 | 101.432 | 101.644 |
1) Compared with VMD-IWOA-WTCN-BiGRU-attention and CatBoost alone, the prediction results are more accurate by combining the proposed multiple models. The RMSE decreased by 33.469 and 11.594, MAPE decreased by 0.282 and 0.095% and MAE decreased by 26.086 and 7.304, respectively. By analyzing the reasons, it can be seen that VMD-IWOA-WTCN-BiGRU-attention has a large prediction deviation in load fluctuation hours because VMD caused a loss of part of the data. The CatBoost model is more accurate in the prediction of stationary components, but the prediction deviation is larger when the data fluctuation is larger. Therefore, the MAPE-RW algorithm is used to integrate the advantages of the two models to create a prediction effect that is more accurate.
2) Compared with other independent prediction models, the prediction results of the model proposed in this paper are closer to the real value. Compared with the WTCN-BiGRU prediction model, the RMSE decreased by 20.078, MAPE decreased by 0.171% and MAE decreased by 13.4. It can be seen that the algorithm based on the bottom combination model also achieves good training results, but it has the disadvantage of low prediction accuracy.
To verify the feasibility and accuracy of the model in different forecasting areas. We employed the 2018 annual measured power generation of the domestic Ningxia Wuzhong Sun Mountain photovoltaic (PV) power plant for PV power generation prediction, as well as five environmental data types, i.e., total solar irradiation, PV panel module temperature, ambient temperature, atmospheric pressure and relative humidity, measured by the environmental detector corresponding to this PV array. The data sets were collected at 15-minute intervals. Since the PV array only emits energy during the daytime, the valid data of the daily 7:30–16:30 PV-emitted power were selected as the model validation data. The data are divided according to the ratio of 10:1:1, and the first 10 months are taken as the training data, November has been applied as the validation data during training and December data as the test set. The prediction model parameters are set in the same way as the power load prediction model parameters. The WTCN-BiGRU-attention network hyperparameters were selected by the WOA algorithm for the PV power generation prediction model, as shown in Table 8, and the best network hyperparameters were selected by the random search algorithm for the CatBoost network, as shown in Table 9.
Modal tags | IMF1 | IMF2 | IMF3 | IMF4 | IMF5 |
Learn rate | 0.0038 | 0.0012 | 0.0059 | 0.0067 | 0.0030 |
Epoch | 72 | 54 | 50 | 73 | 96 |
Batchsize | 32 | 57 | 118 | 117 | 70 |
BiGRU node number | 5 | 1 | 16 | 10 | 3 |
BiGRU node number | 6 | 7 | 15 | 13 | 12 |
Number of nodes at the full | 44 | 12 | 54 | 84 | 12 |
Verbose | Learning rate | Iterations | Depth |
50 | 0.05 | 500 | 9 |
The prediction results of the model are compared with the hybrid neural network models WTCN-BiGRU-attention, TCN-BiGRU-attention and CNN-BiGRU-attention. The daily power generation forecasting results of PV panels for two days are plotted to show more intuitively the advantages of the multi-model fusion forecasting network proposed in this paper. The accuracy of the load curve prediction results is shown in Figures 11 and 12, where different curves represent the prediction results and trends of different models. From the figures, it can be seen that the proposed multi-modal fusion forecasting network has higher accuracy. Table 10 shows the total prediction evaluation index of each model test set.
Prediction model | RMSE | MAPE | MAE | R-square |
Model one | 1.996 | 24.380 | 1.190 | 0.964 |
Model two | 1.434 | 13.119 | 1.023 | 0.982 |
Fusion model of this paper | 1.285 | 12.285 | 0.924 | 0.985 |
WTCN-BiGRU-attention | 3.737 | 26.8 | 2.315 | 0.875 |
TCN-BiGRU-attention | 3.739 | 23.808 | 2.466 | 0.875 |
CNN-BiGRU-attention | 3.738 | 26.243 | 2.365 | 0.875 |
To sum up, the multiple models proposed in this paper combined with the short-term power load prediction model have more outstanding prediction performance, and the prediction results are relatively more stable, meaning that the model can be better used to predict the power load data with multidimensional feature inputs.
The major objective of this study was to build a forecasting model by integrating multiple models to improve the accuracy of power load forecasting. In Model one, decompose the data into multiple components by VMD decomposition. Then, an IWOA is exploited to optimize the super parameters of the WTCN-BiGRU-attention network model. At the same time, Model two is designed for the parallel prediction of multi-dimensional load data by the CatBoost algorithm. Finally, the MAPE-RW algorithm is employed to fuse the prediction results of the two models to achieve accurate and personal measurements of short-term power load data. Taking multidimensional load data of an area in Australia as a model example, the feasibility verification analysis was carried out, and the main conclusions are as follows:
1) Based on the power load data of a certain region in Australia, we constructed a multi-dimensional power load feature set to better predict the non-stationary components with strong fluctuation of power load data, i.e., Model one.
2) The stationary components of multi-dimensional power load data are predicted by Model two, and the model prediction results are fused by the MAPE-RW algorithm, which improve the power load prediction accuracy of multi-model fusion.
To sum up, the hybrid neural network combined model for multidimensional characteristic power load data prediction has been proposed in this paper. This research sheds new light and not only provides reference and choices for short-term power load forecasting methods, but also has good reference significance for other power fields, such as wind power generation forecasting and energy storage unit service-life forecasting. However, the structure of the multi-model fusion network is overly complex, which increases the model prediction time and wastes computer resources while improving the accuracy of power load prediction. Therefore, in the future, the authors will work on designing a more concise interval prediction model for power loading with suitable accuracy. While overcoming the training time, the interval prediction makes the prediction results more meaningful for the power sector to conduct power dispatching and effectively avoid the waste of power resources.
This work was supported by the State Grid Corporation of China Headquarters Science and Technology Project (5400-202122573A-0-5-SF). The authors thank the editors and the anonymous reviewers for their helpful comments and suggestions that have improved the presentation of this manuscript.
The authors declare that there is no conflict of interest.
[1] |
Abdulrahman UFI, Panford JK, Hayfron-Acquah JB (2014) Fuzzy logic approach to credit scoring for micro finances in Ghana: a case study of KWIQPUS money lending. Int J Comput Appl 94: 11–18. https://doi.org/10.5120/16362-5772 doi: 10.5120/16362-5772
![]() |
[2] | Vahid PR, Ahmadi A (2016) Modeling corporate customers' credit risk considering the ensemble approaches in multiclass classification: evidence from Iranian corporate credits. J Credit Risk 12: 71–95. |
[3] | Allen L, Peng L, Shan Y (2020) Social networks and credit allocation on fintech lending platforms. working paper, social science research network. Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id = 3537714. |
[4] | Altman EI (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J Financ 23: 589–609. Available from: https://pdfs.semanticscholar.org/cab5/059bfc5bf4b70b106434e0cb665f3183fd4a.pdf. |
[5] | Altman EI, Narayanan P (1997) An international survey of business failure classification models. in financial markets, institutions and instruments. New York: New York University Salomon Center, 6. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0416.00010. |
[6] | American Banking Association (2018) New credit score unveiled drawing on bank account data. ABA Bank J, October 22. |
[7] |
Anagnostou I, Kandhai D, Sanchez Rivero J, et al. (2020) Contagious defaults in a credit portfolio: a Bayesian network approach. J Credit Risk 16: 1–26. https://dx.doi.org/10.2139/ssrn.3446615 doi: 10.2139/ssrn.3446615
![]() |
[8] |
Bjorkegren D, Grissen D (2020) Behavior revealed in mobile phone usage predicts credit repayment. World Bank Econ Rev 34: 618–634. https://doi.org/10.1093/wber/lhz006 doi: 10.1093/wber/lhz006
![]() |
[9] | Bonds D (1999) Modeling term structures of defaultable bonds. Rev Financ Stud 12: 687–720. Available from: https://academic.oup.com/rfs/article-abstract/12/4/687/1578719?redirectedFrom = fulltext. |
[10] | Breeden J (2021) A survey of machine learning in credit risk. J Risk Model Validat 17: 1–62. |
[11] | Chava S, Jarrow RA (2004) Bankruptcy prediction with industry effects. Rev Financ 8: 537–69. Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id = 287474. |
[12] | Clemen R (1989) Combining forecasts: a review and annotated bibliography. Int J Forecast 5: 559–583 Available from: https://people.duke.edu/~clemen/bio/Published%20Papers/13.CombiningReview-Clemen-IJOF-89.pdf. |
[13] | Coats PK, Fant LF (1993) Recognizing financial distress patterns using a neural network tool. Financ Manage 22: 142–155. Available from: https://ideas.repec.org/a/fma/fmanag/coats93.html. |
[14] | Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Neural Information Processing Systems, 1: 160–167. Association for Computing Machinery, New York. https://doi.org/10.1145/1390156.1390177 |
[15] | Duffie D, Singleton KJ (1999) Simulating correlated defaults. Paper presented at the Bank of England Conference on Credit Risk Modeling and Regulatory Implications Working Paper, Stanford University. Available from: https://kenneths.people.stanford.edu/sites/g/files/sbiybj3396/f/duffiesingleton1999.pdf. |
[16] | Dwyer DW, Kogacil AE, Stein RM (2004) Moody's KMV RiskCalcTM v2.1 Model. Moody's Analytics. Available from: https://www.moodys.com/sites/products/productattachments/riskcalc%202.1%20whitepaper.pdf. |
[17] | Harrell FE Jr (2018) Road Map for Choosing Between Statistical Modeling and Machine Learning, Stat Think. Available from: http://www.fharrell.com/post/stat-ml. |
[18] |
Jacobs Jr M (2022a) Quantification of model risk with an application to probability of default estimation and stress testing for a large corporate portfolio. J Risk Model Validat 15: 1–39. https://doi.org/10.21314/JRMV.2022.023 doi: 10.21314/JRMV.2022.023
![]() |
[19] |
Jacobs Jr M (2022b) Validation of corporate probability of default models considering alternative use cases and the quantification of model risk. Data Sci Financ Econ 2: 17–53. https://doi.org/10.3934/DSFE.2022002 doi: 10.3934/DSFE.2022002
![]() |
[20] |
Jarrow RA, Turnbull SM (1995) Pricing derivatives on financial securities subject to credit risk. J Fnanc 50: 53–85. https://doi.org/10.1111/j.1540-6261.1995.tb05167.x doi: 10.1111/j.1540-6261.1995.tb05167.x
![]() |
[21] | Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, 1: 1097–1105. Available from: https://cse.iitk.ac.in/users/cs365/2013/hw2/krizhevsky-hinton-12-imagenet-convolutional-NN-deep.pdf. |
[22] | Kumar IE, Venkatasubramanian S, Scheidegger C, et al. (2020) Problems with Shapley-Value-Based Explanations as Feature Importance Measures. In International Conference on Machine Learning Research: 5491–5500. Available from: https://proceedings.mlr.press/v119/kumar20e.html. |
[23] |
Lessmann S, Baesens B, Seow HV, et al. (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Europ J Oper Res 247: 124–136. https://doi.org/10.1016/j.ejor.2015.05.030 doi: 10.1016/j.ejor.2015.05.030
![]() |
[24] |
Li K, Niskanen J, Kolehmainen M, et al. (2016) Financial innovation: Credit default hybrid model for SME lending. Expert Syst Appl 61: 343–355. https://doi.org/10.1016/j.eswa.2016.05.029 doi: 10.1016/j.eswa.2016.05.029
![]() |
[25] | Li X, Liu S, Li Z, et al. (2020) Flowscope: spotting money laundering based on graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, 34: 4731–4738. https://doi.org/10.1609/aaai.v34i04.5906 |
[26] | Lou Y, Caruana R, Gehrke J, et al. (2013) Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 623–631. https://dl.acm.org/doi/abs/10.1145/2487575.2487579 |
[27] | Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–4777. Available from: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html. |
[28] | Mckee TE (2000) Developing a bankruptcy prediction model via rough sets theory. Intell Syst Account Financ Manage 9: 159–173. https://doi.org/fv22ks |
[29] | Merton RC (1974) On the pricing of corporate debt: The risk structure of interest rates. J Fnanc 29: 449–470. |
[30] | Mester LJ (1997) What's the point of credit scoring? Federal Reserve Bank of Philadelphia. Bus Rev 3: 3–16. Available from: https://fraser.stlouisfed.org/files/docs/historical/frbphi/businessreview/frbphil_rev_199709.pdf. |
[31] |
Min JH, Lee YC (2005) Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Syst Appl 28: 603–614 https://doi.org/10.1016/j.eswa.2004.12.008 doi: 10.1016/j.eswa.2004.12.008
![]() |
[32] | Molnar C, König G, Herbinger J, et al. (2020) Pitfalls to avoid when interpreting machine learning models. Working Paper, University of Vienna. Available from: http://eprints.cs.univie.ac.at/6427/. |
[33] | Odom MD, Sharda R (1990) A neural network model for bankruptcy prediction. In Joint Conference on Neural Networks, 163–168. IEEE Press, Piscataway, NJ. Available from: https://ieeexplore.ieee.org/abstract/document/5726669. |
[34] | Opitz D, Maclin R (1999) Popular ensemble methods: An empirical study. J Artif Intell Res 11: 169–198. |
[35] | Ribeiro MT, Singh S, Guestrin C (2016) "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 1135–1144. Available from: https://dl.acm.org/doi/abs/10.1145/2939672.2939778. |
[36] |
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1: 206–215. https://doi.org/10.1038/s42256-019-0048-x doi: 10.1038/s42256-019-0048-x
![]() |
[37] | Slack D, Hilgard S, Jia E, et al. (2020) Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 180–186. https://dl.acm.org/doi/abs/10.1145/3375627.3375830 |
[38] | Sudjianto A, Knauth W, Rahul S, et al. (2020) Unwrapping the black box of deep ReLU networks: Interpretability, diagnostics, and simplification. arXiv preprin, Cornell University. https://arXiv.org/abs/2011.04041 |
[39] | Sudjianto A, Zhang A (2021) Designing inherently interpretable machine learning models. In Proceedings of ACM ICAIF 2021 Workshop on Explainable AI in Finance. ACM, New York. https://arXiv.org/abs/2111.01743 |
[40] | Sudjianto A, Zhang A, Yang Z, et al. (2023) PiML toolbox for interpretable machine learning model development and validation. arXiv preprint arXiv. https://doi.org/10.48550/arXiv.2305.04214 |
[41] | U.S. Banking Regulatory Agencies (2011) The U.S. Office of the Comptroller of the Currency and the Board of Governors of Federal Reserve System. SR 11-7/OCC11-12: Supervisory Guidance on Model Risk Management. Washington, D.C. Available from: https://www.federalreserve.gov/supervisionreg/srletters/sr1107a1.pdf. |
[42] | U.S. Banking Regulatory Agencies (2021) The U.S. Office of Comptroller of the Currency, the Board of Governors of the Federal Reserve System, the Federal Deposit Insurance Corporation, the Consumer Financial Protection Bureau, and the National Credit Union Administration. Request for Information and Comment on Financial Institutions' Use of Artificial Intelligence, Including Machine Learning. Washington, D.C. Available from: https://www.federalregister.gov/documents/2021/03/31/2021-06607/requestfor-information-and-comment-on-financial-institutions-use-of-artificialintelligence. |
[43] | U.S. Office of the Comptroller of the Currency (2021) Comptroller's Handbook on Model Risk Management. Washington, D.C. Available from: https://www.occ.treas.gov/publicationsand-resources/publications/comptrollers-handbook/files/model-riskmanagement/index-model-risk-management.html. |
[44] |
Vassiliou PC (2013) Fuzzy semi-Markov migration process in credit risk. Fuzzy Sets and Syst 223: 39–58. https://doi.org/10.1016/j.fss.2013.02.016 doi: 10.1016/j.fss.2013.02.016
![]() |
[45] |
Yang Z, Zhang A, Sudjianto A (2021a) Enhancing explainability of neural networks through architecture constraints. IEEE T Neur Net Learn Syst 32: 2610–2621. https://doi.org/10.1109/TNNLS.2020.3007259 doi: 10.1109/TNNLS.2020.3007259
![]() |
[46] |
Yang Z, Zhang A, Sudjianto A (2021) GAMI-Net: An explainable neural network based on generalized additive models with structured interactions. Pattern Recogn 120: 108192. https://doi.org/10.1016/j.patcog.2021.10819 doi: 10.1016/j.patcog.2021.10819
![]() |
[47] |
Zhu Y, Xie C, Wang G J, et al. (2017) Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China's SME credit risk in supply chain finance. Neural Comput Appl 28: 41–50. https://doi.org/10.1007/s00521-016-2304-x doi: 10.1007/s00521-016-2304-x
![]() |
1. | Ruihan Diao, Yang Lv, Yangyang Ding, Short-term power load comparison based on time series and neural networks considering multiple features, 2023, 2625, 1742-6588, 012002, 10.1088/1742-6596/2625/1/012002 | |
2. | Lingcong Xu, Lanfeng Zhou, 2024, Short-Term Load Forecasting Based on TVF-EMD and CNN-GRU Optimized By DBO, 979-8-3503-0963-8, 1503, 10.1109/ACPEE60788.2024.10532328 | |
3. | Mingju Chen, Fuhong Qiu, Xingzhong Xiong, Zhengwei Chang, Yang Wei, Jie Wu, BILSTM-SimAM: An improved algorithm for short-term electric load forecasting based on multi-feature, 2024, 21, 1551-0018, 2323, 10.3934/mbe.2024102 | |
4. | Lei Dai, Haiying Wang, An Improved WOA (Whale Optimization Algorithm)-Based CNN-BIGRU-CBAM Model and Its Application to Short-Term Power Load Forecasting, 2024, 17, 1996-1073, 2559, 10.3390/en17112559 | |
5. | Qifan Chen, Yunfei Ding, Kun Tian, Qiancheng Sun, 2025, Chapter 4, 978-3-031-73406-9, 33, 10.1007/978-3-031-73407-6_4 |
Characteristic parameters | Parameter types | Character descriptions |
Data | Time | Samples were taken every 30 minutes |
Weather factors | Dew-point humidity | Equilibrium temperature |
Dry-bulb humidity | Aerothermodynamic temperature | |
Wet-bulb temperature | Thermodynamic saturation temperature | |
Humidity | Degree of atmospheric dryness | |
Economic factors | Degree | Price per kWh |
k | Simulated signal center frequency | |||||
2 | 0.0001 | 0.02243 | ||||
3 | 0.0001 | 0.02080 | 0.04274 | |||
4 | 0.0001 | 0.02076 | 0.04167 | 0.06471 | ||
5 | 0.0001 | 0.02076 | 0.04162 | 0.06469 | 0.34702 | |
6 | 0.0001 | 0.02076 | 0.04162 | 0.06238 | 0.08467 | 0.35872 |
Parameters | Parameter values | |
Parameter settings | Population size | 5 |
Max iterations | 5 | |
Constant b | 2 | |
Search for upper and lower bounds | Learn rate | [0.001, 0.01] |
Epoch | [10,100] | |
Batchsize | [16,128] | |
BiGRU node number | [1,20] | |
Number of nodes at the full | [1,100] |
Modal tags | IMF1 | IMF2 | IMF3 | IMF4 | IMF5 |
Learn rate | 0.0047 | 0.0027 | 0.0064 | 0.0054 | 0.0024 |
Epoch | 38 | 26 | 61 | 70 | 89 |
Batchsize | 72 | 19 | 72 | 37 | 22 |
BiGRU node number | 8 | 18 | 8 | 17 | 2 |
BiGRU node number | 16 | 11 | 7 | 17 | 19 |
Number of nodes at the full | 49 | 4 | 46 | 14 | 97 |
Verbose | Learning rate | Iterations | Depth |
50 | 0.03 | 900 | 12 |
Prediction model | RMSE | MAPE | MAE | R-square |
IWOA-WTCN-BiGRU-attention | 110.964 | 0.914 | 82.189 | 0.993 |
CatBoost | 89.089 | 0.727 | 63.407 | 0.996 |
Fusion model of this paper | 77.495 | 0.632 | 56.103 | 0.997 |
WTCN-BiGRU-attention | 103.452 | 0.86 | 76.179 | 0.994 |
WTCN-BiGRU | 97.573 | 0.803 | 69.903 | 0.995 |
WTCN-GRU | 96.584 | 0.814 | 70.868 | 0.995 |
WTCN-LSTM | 100.309 | 0.843 | 73.649 | 0.995 |
WTCN | 103.558 | 0.869 | 75.975 | 0.994 |
TCN | 103.969 | 0.89 | 78.024 | 0.994 |
LSTM | 147.665 | 1.261 | 108.342 | 0.988 |
GRU | 109.991 | 0.932 | 82.114 | 0.994 |
CNN | 337.913 | 3.241 | 275.52 | 0.939 |
RMSE | Model one | Model two | Fusion model | WT-BG-a | WT-BG | WT-G | WT-L | WT |
Jan. | 108.372 | 106.039 | 97.048 | 126.77 | 116.382 | 96.321 | 107.057 | 121.456 |
Feb. | 102.257 | 84.791 | 76.02 | 108.625 | 89.625 | 90.072 | 88.652 | 103.947 |
Mar. | 85.485 | 72.585 | 61.48 | 84.297 | 76.175 | 80.666 | 85.047 | 86.109 |
Apr. | 96.362 | 106.033 | 83.271 | 105.568 | 110.002 | 108.207 | 109.236 | 113.008 |
May | 121.924 | 95.097 | 80 | 101.731 | 101.456 | 101.066 | 100.091 | 108.432 |
Jun. | 141.496 | 86.97 | 82.599 | 100.258 | 91.622 | 93.661 | 98.118 | 104.279 |
Jul. | 131.409 | 86.787 | 77.983 | 101.809 | 87.982 | 92.821 | 95.962 | 100.363 |
Aug. | 170.315 | 84.826 | 102.867 | 105.915 | 103.802 | 105.615 | 112.917 | 107.68 |
Sept. | 127.524 | 83.933 | 81.581 | 104.565 | 99.724 | 99.005 | 101.172 | 100.071 |
Oct. | 85.224 | 89.977 | 65.933 | 95.965 | 93.855 | 99.785 | 101.756 | 97.116 |
Nov. | 79.742 | 84.105 | 62.097 | 99.21 | 95.347 | 101.174 | 106.564 | 100.065 |
Dec. | 79.927 | 82.657 | 63.727 | 103.142 | 101.035 | 92.024 | 101.432 | 101.644 |
Modal tags | IMF1 | IMF2 | IMF3 | IMF4 | IMF5 |
Learn rate | 0.0038 | 0.0012 | 0.0059 | 0.0067 | 0.0030 |
Epoch | 72 | 54 | 50 | 73 | 96 |
Batchsize | 32 | 57 | 118 | 117 | 70 |
BiGRU node number | 5 | 1 | 16 | 10 | 3 |
BiGRU node number | 6 | 7 | 15 | 13 | 12 |
Number of nodes at the full | 44 | 12 | 54 | 84 | 12 |
Verbose | Learning rate | Iterations | Depth |
50 | 0.05 | 500 | 9 |
Prediction model | RMSE | MAPE | MAE | R-square |
Model one | 1.996 | 24.380 | 1.190 | 0.964 |
Model two | 1.434 | 13.119 | 1.023 | 0.982 |
Fusion model of this paper | 1.285 | 12.285 | 0.924 | 0.985 |
WTCN-BiGRU-attention | 3.737 | 26.8 | 2.315 | 0.875 |
TCN-BiGRU-attention | 3.739 | 23.808 | 2.466 | 0.875 |
CNN-BiGRU-attention | 3.738 | 26.243 | 2.365 | 0.875 |
Characteristic parameters | Parameter types | Character descriptions |
Data | Time | Samples were taken every 30 minutes |
Weather factors | Dew-point humidity | Equilibrium temperature |
Dry-bulb humidity | Aerothermodynamic temperature | |
Wet-bulb temperature | Thermodynamic saturation temperature | |
Humidity | Degree of atmospheric dryness | |
Economic factors | Degree | Price per kWh |
k | Simulated signal center frequency | |||||
2 | 0.0001 | 0.02243 | ||||
3 | 0.0001 | 0.02080 | 0.04274 | |||
4 | 0.0001 | 0.02076 | 0.04167 | 0.06471 | ||
5 | 0.0001 | 0.02076 | 0.04162 | 0.06469 | 0.34702 | |
6 | 0.0001 | 0.02076 | 0.04162 | 0.06238 | 0.08467 | 0.35872 |
Parameters | Parameter values | |
Parameter settings | Population size | 5 |
Max iterations | 5 | |
Constant b | 2 | |
Search for upper and lower bounds | Learn rate | [0.001, 0.01] |
Epoch | [10,100] | |
Batchsize | [16,128] | |
BiGRU node number | [1,20] | |
Number of nodes at the full | [1,100] |
Modal tags | IMF1 | IMF2 | IMF3 | IMF4 | IMF5 |
Learn rate | 0.0047 | 0.0027 | 0.0064 | 0.0054 | 0.0024 |
Epoch | 38 | 26 | 61 | 70 | 89 |
Batchsize | 72 | 19 | 72 | 37 | 22 |
BiGRU node number | 8 | 18 | 8 | 17 | 2 |
BiGRU node number | 16 | 11 | 7 | 17 | 19 |
Number of nodes at the full | 49 | 4 | 46 | 14 | 97 |
Verbose | Learning rate | Iterations | Depth |
50 | 0.03 | 900 | 12 |
Prediction model | RMSE | MAPE | MAE | R-square |
IWOA-WTCN-BiGRU-attention | 110.964 | 0.914 | 82.189 | 0.993 |
CatBoost | 89.089 | 0.727 | 63.407 | 0.996 |
Fusion model of this paper | 77.495 | 0.632 | 56.103 | 0.997 |
WTCN-BiGRU-attention | 103.452 | 0.86 | 76.179 | 0.994 |
WTCN-BiGRU | 97.573 | 0.803 | 69.903 | 0.995 |
WTCN-GRU | 96.584 | 0.814 | 70.868 | 0.995 |
WTCN-LSTM | 100.309 | 0.843 | 73.649 | 0.995 |
WTCN | 103.558 | 0.869 | 75.975 | 0.994 |
TCN | 103.969 | 0.89 | 78.024 | 0.994 |
LSTM | 147.665 | 1.261 | 108.342 | 0.988 |
GRU | 109.991 | 0.932 | 82.114 | 0.994 |
CNN | 337.913 | 3.241 | 275.52 | 0.939 |
RMSE | Model one | Model two | Fusion model | WT-BG-a | WT-BG | WT-G | WT-L | WT |
Jan. | 108.372 | 106.039 | 97.048 | 126.77 | 116.382 | 96.321 | 107.057 | 121.456 |
Feb. | 102.257 | 84.791 | 76.02 | 108.625 | 89.625 | 90.072 | 88.652 | 103.947 |
Mar. | 85.485 | 72.585 | 61.48 | 84.297 | 76.175 | 80.666 | 85.047 | 86.109 |
Apr. | 96.362 | 106.033 | 83.271 | 105.568 | 110.002 | 108.207 | 109.236 | 113.008 |
May | 121.924 | 95.097 | 80 | 101.731 | 101.456 | 101.066 | 100.091 | 108.432 |
Jun. | 141.496 | 86.97 | 82.599 | 100.258 | 91.622 | 93.661 | 98.118 | 104.279 |
Jul. | 131.409 | 86.787 | 77.983 | 101.809 | 87.982 | 92.821 | 95.962 | 100.363 |
Aug. | 170.315 | 84.826 | 102.867 | 105.915 | 103.802 | 105.615 | 112.917 | 107.68 |
Sept. | 127.524 | 83.933 | 81.581 | 104.565 | 99.724 | 99.005 | 101.172 | 100.071 |
Oct. | 85.224 | 89.977 | 65.933 | 95.965 | 93.855 | 99.785 | 101.756 | 97.116 |
Nov. | 79.742 | 84.105 | 62.097 | 99.21 | 95.347 | 101.174 | 106.564 | 100.065 |
Dec. | 79.927 | 82.657 | 63.727 | 103.142 | 101.035 | 92.024 | 101.432 | 101.644 |
Modal tags | IMF1 | IMF2 | IMF3 | IMF4 | IMF5 |
Learn rate | 0.0038 | 0.0012 | 0.0059 | 0.0067 | 0.0030 |
Epoch | 72 | 54 | 50 | 73 | 96 |
Batchsize | 32 | 57 | 118 | 117 | 70 |
BiGRU node number | 5 | 1 | 16 | 10 | 3 |
BiGRU node number | 6 | 7 | 15 | 13 | 12 |
Number of nodes at the full | 44 | 12 | 54 | 84 | 12 |
Verbose | Learning rate | Iterations | Depth |
50 | 0.05 | 500 | 9 |
Prediction model | RMSE | MAPE | MAE | R-square |
Model one | 1.996 | 24.380 | 1.190 | 0.964 |
Model two | 1.434 | 13.119 | 1.023 | 0.982 |
Fusion model of this paper | 1.285 | 12.285 | 0.924 | 0.985 |
WTCN-BiGRU-attention | 3.737 | 26.8 | 2.315 | 0.875 |
TCN-BiGRU-attention | 3.739 | 23.808 | 2.466 | 0.875 |
CNN-BiGRU-attention | 3.738 | 26.243 | 2.365 | 0.875 |