Artificial intelligence (AI) models can effectively identify the financial risks existing in Chinese manufacturing enterprises. We use the financial ratios of 1668 Chinese A-share listed manufacturing enterprises from 2016 to 2021 for our empirical analysis. An AI model is used to obtain the financial distress prediction value for the listed manufacturing enterprises. Our results show that the random forest model has high accuracy in terms of the empirical prediction of the financial distress of Chinese manufacturing enterprises, which reflects the effectiveness of the AI model in predicting the financial distress of the listed manufacturing enterprises. Profitability has the highest degree of importance for predicting financial distress in manufacturing firms, especially the return on equity. The results in this paper have good policy implications for how to use the AI model to improve the early warning and monitoring system of financial risks and enhance the ability of financial risk prevention and control.
Pedro J. Gutiérrez-Diez, Jorge Alves-Antunes .
Stock market uncertainty determination with news headlines: A digital twin approach. AIMS Mathematics, 2024, 9(1): 1683-1717.
doi: 10.3934/math.2024083
[2]
Muhammad Danish Zia, Esmail Hassan Abdullatif Al-Sabri, Faisal Yousafzai, Murad-ul-Islam Khan, Rashad Ismail, Mohammed M. Khalaf .
A study of quadratic Diophantine fuzzy sets with structural properties and their application in face mask detection during COVID-19. AIMS Mathematics, 2023, 8(6): 14449-14474.
doi: 10.3934/math.2023738
[3]
Ilyos Abdullayev, Elvir Akhmetshin, Irina Kosorukova, Elena Klochko, Woong Cho, Gyanendra Prasad Joshi .
Modeling of extended osprey optimization algorithm with Bayesian neural network: An application on Fintech to predict financial crisis. AIMS Mathematics, 2024, 9(7): 17555-17577.
doi: 10.3934/math.2024853
[4]
Khaled Tarmissi, Hanan Abdullah Mengash, Noha Negm, Yahia Said, Ali M. Al-Sharafi .
Explainable artificial intelligence with fusion-based transfer learning on adverse weather conditions detection using complex data for autonomous vehicles. AIMS Mathematics, 2024, 9(12): 35678-35701.
doi: 10.3934/math.20241693
[5]
Suyan Tan, Yilin Guo .
A study of the impact of scientific collaboration on the application of Large Language Model. AIMS Mathematics, 2024, 9(7): 19737-19755.
doi: 10.3934/math.2024963
[6]
Kottakkaran Sooppy Nisar, Muhammad Shoaib, Muhammad Asif Zahoor Raja, Yasmin Tariq, Ayesha Rafiq, Ahmed Morsy .
Design of neural networks for second-order velocity slip of nanofluid flow in the presence of activation energy. AIMS Mathematics, 2023, 8(3): 6255-6277.
doi: 10.3934/math.2023316
Iqbal M. Batiha, Reyad El-Khazali, Osama Y. Ababneh, Adel Ouannas, Radwan M. Batyha, Shaher Momani .
Optimal design of $ PI^\rho D^\mu $-controller for artificial ventilation systems for COVID-19 patients. AIMS Mathematics, 2023, 8(1): 657-675.
doi: 10.3934/math.2023031
[9]
Zimei Huang, Zhenghui Li .
The impact of digital economy on the financial risk ripple effect: evidence from China. AIMS Mathematics, 2024, 9(4): 8920-8939.
doi: 10.3934/math.2024435
[10]
A. Presno Vélez, M. Z. Fernández Muñiz, J. L. Fernández Martínez .
Enhancing structural health monitoring with machine learning for accurate prediction of retrofitting effects. AIMS Mathematics, 2024, 9(11): 30493-30514.
doi: 10.3934/math.20241472
Abstract
Artificial intelligence (AI) models can effectively identify the financial risks existing in Chinese manufacturing enterprises. We use the financial ratios of 1668 Chinese A-share listed manufacturing enterprises from 2016 to 2021 for our empirical analysis. An AI model is used to obtain the financial distress prediction value for the listed manufacturing enterprises. Our results show that the random forest model has high accuracy in terms of the empirical prediction of the financial distress of Chinese manufacturing enterprises, which reflects the effectiveness of the AI model in predicting the financial distress of the listed manufacturing enterprises. Profitability has the highest degree of importance for predicting financial distress in manufacturing firms, especially the return on equity. The results in this paper have good policy implications for how to use the AI model to improve the early warning and monitoring system of financial risks and enhance the ability of financial risk prevention and control.
1.
Introduction
Nowadays, the number of business problems, or even bankruptcy, caused by the company's financial distress is increasing under the backdrop of great downward pressure on the economy. The financial crisis of listed companies will not only bring huge investment risks to financial investors, creditors and investors, but it will also produce a series of chain reactions, which makes the standardization of the securities market and the governance ability of listed companies questionable. The research on financial risk prediction of listed companies helps to avoid investment risks in a timely manner, protects the rights and interests of stakeholders and has a positive role in reshaping investor confidence. Chinese president Xi Jinping has repeatedly stressed that we must deeply grasp the characteristics of the artificial intelligence development, strengthen the integration of the artificial intelligence and industrial development and provide new momentum for high-quality development. The application of the artificial intelligence technology to improve the predictive effect of financial distress of manufacturing enterprises can enable the intelligent supervision and high-quality development of technology-enabled finance and prevent systemic financial risks.
In the existing literature, the model used for financial distress prediction mainly focuses on the calculation of default probability and bankruptcy prediction to estimate financial risk. Altman [1] pioneered the use of financial ratios to predict corporate bankruptcy. Since then, most research has used financial ratios to predict financial distress. Classical statistical models to calculate the default probability or default premium include linear probability, logit, probit and linear discriminant analysis (Z-score) models [2,3,4]. What these models have in common is the inclusion of highly correlated financial indicators, such as the use of a combination of five (or seven) financial ratios to accurately predict corporate bankruptcy [5,6,7,8]. The specific difference between the above methods is that the linear probability model is based on linear regression technology to predict financial risk. In the logistic model, the residuals follow a standard logistic distribution, while in the probit model, the residuals follow a standard normal distribution. Linear discriminant analysis produces an overall discriminant score by means of a multivariate discriminant model. These classical statistical models are still widely used in the 1960s because their formulas are simple and easy to understand and they are not difficult to implement operationally [9,10,11]. However, the linearity, normality and independence assumptions of statistical models are difficult to satisfy at the same time, so there are natural flaws in their validity and applicability [5,12,13,14].
Monitoring methods such as artificial intelligence and data mining have developed rapidly in recent years. The use of the artificial intelligence models such as support vector machines [15], decision trees [16] and neural networks [17] to predict corporate financial distress has gradually become the mainstream. Compared with traditional statistical models, artificial intelligence models do not make strictly restrictive assumptions about the distribution of data, and they can also effectively handle big data [12,18]. The stability, accuracy and applicability of the forecast results have been greatly improved [19]. While there are many artificial intelligence models used to predict corporate financial distress, a system of indicators with multiple data types can make a single classification method perform poorly. More specifically, single decision trees and neural networks tend to underperform ensemble learning methods when dealing with multiple data types. In addition, the principle of neural network modeling is like putting monitoring indicators into a black box and then setting the hidden layer and node number to get monitoring results, which makes the interpretation of neural networks difficult [20,21,22]. Since Chen and Guestrin [23] proposed extreme gradient boosting (XGBoost), empirical studies on bankruptcy prediction have achieved better prediction results. Carmona et al. [24] used the XGBoost algorithm to predict bank failure and found that it has better discrimination performance in predicting financial distress than other methods. Ensemble learning can use some (different) means to change the distribution of the original training samples so as to build multiple different classifiers, as well as a linear combination of these classifiers to obtain a more powerful classifier to make the final decision [12]. Many studies have compared the performance of various traditional and artificial intelligence models, such as logit regression, probit regression, linear discriminant analysis, decision tree, neural network, support vector machine, random forest and ensemble learning models [2,12,23,25,26]. Ensemble learning avoids overfitting, has good generalization performance and shows the best prediction performance, especially when the gradient boosting tree is applied in an ensemble learning model.
At present, there is little literature on the application of an artificial intelligence model for the prediction of the financial distress of Chinese manufacturing enterprises. How can the accuracy of the financial distress prediction of Chinese manufacturing enterprises be improved? To answer the above question more accurately, we plan to use the financial ratios of 1, 668 Chinese A-share listed manufacturing enterprises from 2016 to 2021 for our empirical analysis. An artificial intelligence model was used to obtain the financial distress prediction values for the listed manufacturing enterprises. Our results can provide an adequate early warning of the financial risks of Chinese manufacturing enterprises, as well as help the financial management departments and provincial governments to strengthen the monitoring and early warning of financial risks in the manufacturing industry. Compared with the existing literature, the marginal contributions of this paper are mainly reflected in the following two points. First, we used seven statistical models and artificial intelligence models to predict the financial distress of the listed manufacturing enterprises. From logistic regression, neural network, decision tree, random forest, support vector machine, XGBoost and categorical boosting (CatBoost) models, the most suitable model for the financial distress prediction of China's listed manufacturing enterprises is selected. We provide evidence for the selection of a financial distress prediction model for manufacturing enterprises and provide a new empirical basis for the application of an artificial intelligence model in China's financial risk monitoring and early warning. Second, our empirical study of financial distress prediction of listed manufacturing firms extends the related research on corporate risk and intelligent regulation.
The remainder of the paper is organized as follows. Section 2 presents a financial distress prediction framework with an artificial intelligence model. We describe the random forest model and the model evaluation method. Section 3 presents an empirical study based on the sample data. We show the predictive performance and important indicators. Section 4 concludes the paper with conclusions and policy implications.
2.
Methodology
2.1. Modeling methods
2.1.1. Model comparison and selection
We have chosen the random forest model to predict financial distress for the following reasons:
First, artificial intelligence models do not make strictly restrictive assumptions about the distribution of data as compared with traditional statistical models. The stability, accuracy and applicability of the forecast results have been greatly improved [19]. Single decision trees and neural networks tend to underperform ensemble learning methods when dealing with multiple data types. In addition, the principle of the neural network is like putting monitoring indicators into a black box and then setting the hidden layer and node number to get monitoring results, which makes the interpretation of the neural network difficult [20,21,22]. Ensemble learning can use some (different) means to change the distribution of the original training samples to build multiple different classifiers; it can also use the linear combination of these classifiers to obtain a more powerful classifier to make the final decision [12].
Second, the random forest model is being favored by more and more scholars. The random forest is the most common and widely used supervised machine learning method in the ensemble learning algorithm. Moreover, according to the existing research, the random forest model performs best in machine learning models, and the prediction accuracy is above 85% [19]. At the same time, the random forest model not only has high prediction accuracy, but it can also balance the error. Since companies in financial distress account for a small proportion of the overall sample, the sample of this study is unbalanced. The random forest model can balance the errors and is suitable for our study.
Third, after training the samples, the random forest model can evaluate each feature by using the Gini coefficient. This feature reduces the black box nature of machine learning methods.
2.1.2. Random forest
The random forest is a popular machine learning model that is widely used in credit scoring and bankruptcy prediction. The random forest was first proposed by Breiman [27], and it is a classifier containing multiple decision trees. The basic idea of a random forest is to combine multiple weak decision trees into a strong decision tree to improve the performance of a single decision tree. Then, the mode of feature classification is obtained by voting for each tree so as to predict the final result. Based on the setting utilized by Katuwal et al. [28], we set the random forest model as follows:
(1)
where is the probability distribution of each tree .
The random forest is compared with benchmark methods, namely, logistic regression, neural networks, decision trees, support vector machines, XGBoost and CatBoost, to verify its actual prediction performance.
2.2. Choice of tuning parameters
Logistic regression, neural networks, decision trees, random forests, support vector machines, XGBoost and CatBoost all relied on R software for model construction. In terms of parameter selection, the parameters of the logistic regression and decision tree were set to default values. However, in order to improve performance classification and avoid overfitting problems, we performed hyperparameter tuning on the neural networks, random forests, support vector machines, XGBoost and CatBoost by using grid search and cross-validation. Grid search is a parameter optimization technique, which is widely used in the parameter tuning of the artificial intelligence models. The optimal parameter settings of the artificial intelligence model described in this paper are shown in Table 1.
Table 1.
Tuning hyperparameters for models; neural network: NN, random forest: RF, decision tree: Tree, support vector machine: SVM.
We use accuracy (ACC) and area under receiver operating characteristic curve (AUC-ROC) to compare the predictive power of the logistic regression, neural network, decision tree, random forest, support vector machine, XGBoost and CatBoost models for financial distress. AUC can be understood as the probability that the model predicts that the true positive (TP) rate is greater than the false positive (FP) rate when one positive sample and one negative sample are taken at random. Before defining ACC and AUC, the TP, FP, true negative (TN), false negative (FN), TP rate (TPR), FP rate (FPR), TN rate (TNR) and FN rate (FNR) were defined according to the confusion matrix shown in Table 2.
Based on Table 2, the definitions of TPR, FPR, TNR, FNR, ACC and AUC are as follows:
(2)
(3)
(4)
(5)
(6)
(7)
3.
Results
3.1. Variables and data sources
3.1.1. Variables
According to the actual situation of financial distress in China's manufacturing industry and scholars' attention to the financial indicators of the manufacturing industry [2,9,29,30], we divided the collected financial ratios into six dimensions to predict the financial distress of the manufacturing industry, namely, profitability, solvency, development ability, operating ability, cash flow ability and capital structure, as shown in Table 3.
Table 3.
Definitions of explanatory variables.
Financial dimension
Variable
Definition
Formula
Profitability
X1
Return on equity
Net income / Shareholders' equity
X2
Return on assets
Net income / Total assets
X3
Net profit margin
Net income / Operating revenue
X4
Expense to sales
Sales expenses / Operating revenue
Solvency
X5
Current ratio
Current assets / Current liabilities
X6
Liquidity ratio
Current asset stocks / Current liabilities
X7
Equity ratio
Total liabilities / Shareholders' equity
X8
Operating cash flow-to-debt ratio
Net cash flow from operations / Total debt
X9
Operating cash flow-to-current liabilities ratio
Net cash flow from operations / Current liabilities
Development ability
X10
Year-on-year growth rate of operating revenue
Increase in current year's operating income / Total previous year's operating income
X11
Year-on-year growth rate of operating profit
Current year profit growth / Total previous year profit
X12
Year-on-year growth rate of operating cash flow
Current year operating net cash flow growth / Total previous year operating net cash flow
X13
Growth rate of net assets
Current year net assets growth / Total previous year net assets
X14
Growth rate of total assets
Current year total assets growth / Total previous year total assets
Operating ability
X15
Account receivable turnover
Operating income / Average accounts receivable
X16
Current asset turnover
Operating income / Average net current assets
X17
Fixed assets turnover
Operating income / Average net fixed assets
X18
Total assets turnover
Operating income / Average total assets
Cash flow ability
X19
Sales-to-cash flow ratio
Net cash flow from operations / Operating income
X20
Operating income cash ratio
Cash received from selling goods and providing services / Operating income
X21
Cash recovery ratio of total assets
Net cash flow from operations / Total assets
Capital structure
X22
Asset-liability ratio
Total liabilities / Total assets
X23
Current assets ratio
Current assets / Total assets
X24
Fixed assets ratio
Fixed assets / Total assets
X25
Current debt ratio
Current debt / Total liabilities
X26
Debt-to-long capital ratio
Total long-term liabilities / (Total long-term liabilities + shareholders' equity)
According to the industry classification guidelines of listed companies issued by the China Securities Regulatory Commission (CSRC) in 2012, we selected 1, 668 Chinese A-share listed manufacturing enterprises as samples after eliminating samples with considerable missing data. We applied the artificial intelligence model introduced in Section 2 to predict the financial distress of manufacturing enterprises. The sample data were annual data from 2016 to 2021. The financial data of the listed manufacturing enterprises were mainly from the Wind database. In general, financial distress leads to an erosion of a firm's profitability. Much of the research in this area has looked at bankruptcy as a result of financial distress. However, data related to the bankruptcies of listed companies in China are difficult to obtain. As an alternative, the definition of the "Special Treatment" (ST) of a company, as presented by the CSRC, may be seen as close to describing financial distress [31,32,33,34]. We define whether a manufacturing firm is in financial distress in that year based on the timing of ST or delisting. The year of ST or delisting is recorded as the year when the listed manufacturing enterprise is in financial distress. After eliminating the missing values, there were still 10, 007 observed values for the 1, 668 Chinese A-share listed manufacturing enterprises, among which 509 samples were in financial distress, accounting for 5.09% of the total sample. Using random sampling, 70% of the sample was used as the training set and 30% as the test set. The descriptive statistics of each indicator are shown in Table 4.
Table 4.
Descriptive statistics.
Mean
Max.
Min.
Std. Dev.
Skew.
Kurt.
X1
6.3313
5400.1671
−1277.1794
62.5843
65.3038
5699.0080
X2
3.9559
81.5616
−97.5232
8.3262
−1.6777
17.4696
X3
2.6982
156.3490
−5442.7347
76.1195
−51.4017
3291.6990
X4
21.5984
1750.3709
−47.9374
27.3027
32.2398
1789.8810
X5
2.3846
80.6637
0.0715
2.8005
8.7953
145.8414
X6
1.8766
60.4173
0.0484
2.4454
8.2501
119.3816
X7
1.2107
194.0507
−181.8149
4.9871
14.0926
742.7602
X8
0.1930
5.0480
−11.7356
0.3834
−0.3168
124.8073
X9
0.2343
6.7687
−15.3086
0.4877
−1.0176
139.3556
X10
19.9843
8269.9179
−98.8233
137.1655
39.9626
1994.5710
X11
39.6931
1398353
−300540
14905.2300
83.3389
7974.3380
X12
139.6043
145302.7150
−66573.7540
2998.6370
26.1186
1282.3700
X13
19.6424
17632.2353
−98.5955
221.7191
60.5292
4381.8630
X14
14.2982
1839.0951
−92.9032
48.0487
17.4981
490.9641
X15
40.0740
20873.5102
0.0353
442.8323
28.0359
955.5722
X16
1.2767
12.0655
0.0061
0.8783
2.8939
18.3175
X17
14.6846
55200.7412
0.0213
631.1668
78.1708
6431.8200
X18
0.6724
7.7880
0.0060
0.4195
3.0022
24.3635
X19
8.2708
161.9600
−782.1000
22.9947
−14.6610
426.9572
X20
96.7410
663.5000
8.5200
22.4384
3.7638
70.6896
X21
5.2778
92.0075
−70.3538
7.3987
−0.1228
10.5305
X22
41.3800
105.8638
0.8359
19.0591
0.2962
2.5969
X23
56.5613
99.8423
4.0784
16.6003
−0.1688
2.6022
X24
43.4387
95.9216
0.1577
16.6003
0.1688
2.6022
X25
83.4102
121.7465
3.7539
14.8272
−1.3100
4.6938
X26
13.0065
99.1000
−23.2981
14.9778
2.0032
8.2377
Note: 'Mean', 'Max.', 'Min.', 'Std. Dev.', 'Skew.' and 'Kurt.' denote the average value, maximum value, minimum value, standard deviation, skewness and kurtosis, respectively.
We will respectively compare the predictive performance of the logistic regression, neural network, decision tree, random forest, support vector machine, XGBoost and CatBoost models in terms of the financial distress of listed manufacturing firms. We evaluate the predictive performance of the model by using two evaluation metrics: ACC and AUC-ROC. Tables 5 and 6 show the ACC and AUC results of these seven financial distress prediction models that used financial indicators with different numbers of early warning periods to predict financial distress of the listed manufacturing enterprises. Tables 5 and 6 show that the warning lead time is 1 year, which means that the prediction result of whether to be in financial distress in year Y is obtained from the data in year Y−1, and the rest is the same.
Table 5.
ACC of different prediction models by period; logistic regression: Logistic.
Year
Logistic
Tree
NN
RF
SVM
XGBoost
CatBoost
Mean
Y
0.9647
0.9562
0.9583
0.9569
0.9626
0.9562
0.9605
0.9593
Y−1
0.9647
0.9569
0.9597
0.9583
0.9626
0.9562
0.9633
0.9603
Y−2
0.9612
0.9541
0.9583
0.9612
0.9626
0.9619
0.9612
0.9600
Y−3
0.9619
0.9626
0.9400
0.9569
0.9626
0.9562
0.9626
0.9575
Y−1, Y−2
0.9612
0.9513
0.9520
0.9619
0.9626
0.9605
0.9647
0.9591
Y−1, Y−2, Y−3
0.9583
0.9548
0.9548
0.9682*
0.9626
0.9682*
0.9675
0.9621
Mean
0.9620
0.9560
0.9539
0.9606
0.9626
0.9599
0.9633
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Table 6.
AUC-ROC calculated by type of model and period.
Year
Logistic
Tree
NN
RF
SVM
XGBoost
CatBoost
Mean
Y
0.8018
0.7800
0.6449
0.8711
0.7618
0.8257
0.8529
0.7912
Y−1
0.8089
0.7794
0.7693
0.8660
0.7932
0.8415
0.8467
0.8150
Y−2
0.7764
0.6351
0.7995
0.8645
0.8219
0.8631
0.8704
0.8044
Y−3
0.7723
0.7420
0.6120
0.8291
0.7360
0.8100
0.8202
0.7602
Y−1, Y−2
0.8304
0.8051
0.6541
0.9149
0.8423
0.8655
0.8827
0.8279
Y−1, Y−2, Y−3
0.7990
0.8030
0.7765
0.9390*
0.8418
0.9194
0.9155
0.8563
Mean
0.7981
0.7574
0.7094
0.8808
0.7995
0.8542
0.8647
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
First, the CatBoost model has the highest average ACC (96.33%) for financial distress prediction in different warning periods. The random forest has the highest average AUC (0.8808) for financial distress prediction in different warning periods. All models had an average ACC of more than 95% for financial distress predictions. In addition, the random forest, XGBoost, and CatBoost models had average AUC values of over 0.85. Similar to the results obtained in existing literature, the random forest and gradient boosting algorithms showed the best prediction performance for a single model prediction framework. For example, Ben Jabeur et al. [2] compared multiple models to predict the financial distress of French enterprises; they found that only the random forest, XGBoost and CatBoost models achieved an average accuracy of more than 80%. Xia et al. [35] used linear regression, random forest and gradient boosting algorithms to score the credit of P2P lending customers, and the results showed that the prediction performance of the random forest and CatBoost models was better than that of traditional statistical models.
Second, in terms of the number of early warning periods, the financial indicators of 1-3-year were found to have the best early warning effect for the financial distress of the listed manufacturing enterprises. Specifically, the average ACC and average AUC of financial indicators with an early warning lead period of 1-3-year for financial distress prediction were both the maximum values, i.e., 96.21% and 0.8563, respectively. In addition, the average accuracy of all warning lead periods was over 95%. The accuracy of the prediction model constructed in this study is consistent with the performance of the model accuracy in most reports, which shows a decrease with the increase in the number of early warning periods [2].
Finally, the ROC curve was drawn based on the results of the TPR (also known as sensitivity) and TNR (also known as specificity) obtained by setting different classification thresholds. Figures 1-6 show the ROC curves of the seven financial distress prediction models at different warning periods. It can be ascertained from the ROC curve that the performance of the financial distress prediction model is obviously divided into three levels. The first level consists of random forest, XGBoost and CatBoost algorithms. The second level is composed of logistic regression, decision tree and support vector machine algorithms. The third level is a neural network.
Figure 1.
Comparison of different models using the ROC at the current period.
In general, the random forest model has high effectiveness in terms of predicting the financial distress of listed manufacturing enterprises based on the data of a 1-3-year early warning period, with the ACC and AUC both being the maximum values, i.e., 96.82% and 0.9390, respectively. Specificity and sensitivity also confirmed excellent performance on different classification thresholds. Therefore, the results show that the random forest model has a certain level of effectiveness in terms of predicting the financial distress of the listed manufacturing enterprises based on the data of a 1-3-year early warning period. This also shows that the data of a 1-3-year early warning period is more consistent with the financial distress warning period of China's listed manufacturing enterprises, while the random forest model can better describe the impact of financial indicators on financial distress; the combination of the two can achieve better warning effect.
3.3. Feature importance
The importance of indicators can reflect the contribution of a single financial indicator to the prediction of the financial distress of listed manufacturing enterprises. Figure 7 depicts the 10 most important financial indicators when the random forest model uses the data of the 1-3-year warning period to predict the financial distress of listed manufacturing enterprises. We calculated the degree of accuracy decline after removing a certain indicator as the importance result of the indicator.
Figure 7.
Ten most important financial indicators at the 1-3-year horizon.
It can be seen that, when predicting the financial distress of China's listed manufacturing enterprises, the profitability of manufacturing enterprises needs special emphasis, especially, the return on equity (X1). This result is similar to the conclusion of Mohamed et al. [36]. Their study used an adaptive neuro-fuzzy model to investigate the relationship between a firm's performance indicators and share price prediction; they found that the return on equity was the most significant predictor. Profitability (X1, X2, X3 and X4) is the most important of all dimensions for predicting financial distress. Excluding the indicators of the profitability dimension, the accuracy of the model prediction will decrease by 5.70%. The reason may be as stated by Zhou et al. [37]: enterprises with poor profitability lead to low financial performance, greater operational risks of enterprises and limited growth and development. Return on equity at the 2-year horizon (lag2_X1) had the highest importance (1.17%) among all indicators of different warning periods. The second is the return on assets at the 2-year horizon (lag2_X2), which had a significance of 0.80%. The net profit margin at the 1-year horizon (lag_X3) ranked third in importance, with a value of 0.75%. In terms of a single indicator, return on equity (X1) had the largest contribution to the financial distress prediction of the listed manufacturing enterprises, reaching 1.91%. The above results show that manufacturing enterprises are in financial distress mainly due to a low profit margin. The return on equity (X1) and other indicators reflecting profitability can effectively predict the current profits and losses of manufacturing enterprises, and they can also predict the probability of the future financial distress of manufacturing enterprises to a large extent. In recent years, China's financing policy has continued to tighten, and the financing cost of manufacturing enterprises has risen. During the COVID-19 epidemic, prevention and control measures led to the shutdown of most manufacturing enterprises, which seriously hindered the normal production and operation of the company and made the company face many risks, such as supply chain disruption. This has also become the main reason why most manufacturing companies are in financial trouble. The importance of return on equity (X1), return on assets (X2) and net profit margin (X3) ranked as the top three, again indicating that profitability is an indispensable part of the financial distress early warning system of Chinese manufacturing enterprises.
3.4. Out-of-sample forecasting performance
The previous empirical results show that the random forest model has the optimal effect on financial distress prediction. It can identify exactly which companies are in financial distress. The accuracy of predicting financial distress by using a random forest with the data for 1-3-year of am early warning period reached 96.82%, and the AUC was 0.9390. In order to better test the forecasting performance of the random forest model on financial distress, we used a sample of listed manufacturing firms in China in 2021 for out-of-sample forecasting. There were 124 listed manufacturing companies in financial distress and 1, 537 healthy companies in 2021. The model trained in the previous section was used for out-of-sample prediction, and the prediction results are shown in Tables 7 and 8. In the out-of-sample prediction of the financial distress of China's listed manufacturing enterprises in 2021, the random forest model using the data for 1-3-year of an early warning period still had the best performance among all prediction frameworks, with an accuracy of 98.75% and an AUC of 0.9901. This shows that the random forest model has excellent out-of-sample forecasting performance and is suitable for identifying the financial distress of China's listed manufacturing firms. Therefore, it is necessary to use nonlinear tools such as a random forest model to predict enterprise financial distress to improve the early warning mechanism of enterprise financial distress.
Table 7.
ACC of out-of-sample forecasting.
Year
Logistic
Tree
NN
RF
SVM
XGBoost
CatBoost
Mean
Y
0.9401
0.9266
0.9401
0.9774
0.9597
0.9645
0.9368
0.9493
Y−1
0.9384
0.9253
0.9409
0.9797
0.9562
0.9681
0.9374
0.9494
Y−2
0.9416
0.9374
0.9398
0.9803
0.9626
0.9717
0.9567
0.9557
Y−3
0.9314
0.9253
0.9302
0.9694
0.9562
0.9681
0.9332
0.9448
Y−1, Y−2
0.9442
0.9386
0.9467
0.9827
0.9562
0.9717
0.9554
0.9565
Y−1, Y−2, Y−3
0.9525
0.9488
0.9375
0.9875*
0.9375
0.9825
0.9663
0.9589
Mean
0.9414
0.9337
0.9392
0.9795
0.9547
0.9711
0.9476
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
We used the financial ratios of 1, 668 Chinese A-share listed manufacturing enterprises from 2016 to 2021 for our empirical analysis. An artificial intelligence model was used to obtain the financial distress prediction values for listed manufacturing enterprises. Our results show that the random forest model has high accuracy in terms of the empirical prediction of the financial distress of Chinese manufacturing enterprises, which reflects the effectiveness of the artificial intelligence model in terms of the prediction of the financial distress of the listed manufacturing enterprises. Profitability has the highest degree of importance for the prediction of financial distress in manufacturing firms, especially the return on equity.
Based on the results in this paper and the current developmental trend of financial risks in the manufacturing industry, our research can provide certain insight into intelligent supervision.
First, regulators need to take profitability indicators lagging 1-3-year as early warning indicators of corporate distress to improve the long-term mechanism of financial risk prevention and control. The results in this paper show that a profitability indicator lagging 1-3-year can be used as an important indicator to predict the financial distress of enterprises. Therefore, in the process of improving the early warning mechanism of enterprise financial risk, we should consider the comprehensive measurement of enterprise systemic risk and further optimize the early warning mechanism of enterprise financial risk.
Second, it is necessary to appropriately introduce cutting-edge machine learning monitoring means to improve the prudential supervision mechanism at the technical level. In recent years, with the continuous development of regulatory technology, frontier technologies such as big data and machine learning have also been actively promoted and applied in the field of financial supervision in various countries. Combining machine learning methods to improve financial risk monitoring, early warning, prevention and control systems will become one of the ideas to improve the prudential supervision mechanism in the future. The empirical results in this paper show that the forecasting performance of the random forest model is always better than that of other forecasting models, showing more robust and accurate forecasting ability. Therefore, machine learning methods can be combined to strengthen the ability to predict the financial crisis of listed enterprises.
It is worth noting that, although this paper shows that the random forest model can effectively identify the financial distress of the vast majority of enterprises, the identification efficiency of the above model will be reduced in the face of financial distress caused by financial fraud, net asset value shrinkage, sudden operation damage and so on. This also means that regulatory authorities should also combine qualitative regulatory means, such as on-site inspection, to urge listed companies to optimize corporate governance and curb the growth of corporate financial risks. There are several future directions for this research. First, considering the constantly changing financial environment, more valuable text features or other features from social media can be applied in the model to further improve the recognition performance. Second, this study treated financial distress prediction as a binary classification. However, financial distress has different degrees, and future research needs to explore prediction models with three or more categories.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (No. 12101622).
Conflict of interest
The authors declare no conflict of interest.
Appendix
A number of abbreviations have been adopted above. For ease of understanding, we include the relevant abbreviations and their explanations in Table A1.
Table A1.
Abbreviations and explanations.
Abbreviation
Explanation
Abbreviation
Explanation
AI
Artificial intelligence
NN
Neural network
ACC
Accuracy
RF
Random forest
AUC
Area under receiver operating characteristic curve
E. I. Altman, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance, 23 (1968), 589–609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x doi: 10.1111/j.1540-6261.1968.tb00843.x
[2]
S. B. Jabeur, C. Gharib, S. Mefteh-Wali, W. B. Arfi, CatBoost model and artificial intelligence techniques for corporate failure prediction, Technol. Forecast. Soc., 166 (2021), 120658. https://doi.org/10.1016/j.techfore.2021.120658 doi: 10.1016/j.techfore.2021.120658
[3]
K. Peng, G. Yan, A survey on deep learning for financial risk prediction, Quant. Financ. Econ., 5 (2021), 716–737. https://doi.org/10.3934/QFE.2021032 doi: 10.3934/QFE.2021032
[4]
T. M. Awan, M. S. Khan, I. U. Haq, S. Kazmi, Oil and stock markets volatility during pandemic times: a review of G7 countries, Green Finance, 3 (2021), 15–27. https://doi.org/10.3934/GF.2021002 doi: 10.3934/GF.2021002
[5]
E. I. Altman, Predicting financial distress of companies: revisiting the Z-score and ZETA® models, In: Handbook of research methods and applications in empirical finance, Edward Elgar Publishing, 2013,428–456. https://doi.org/10.4337/9780857936080.00027
[6]
T. H. Li, X. Li, G. K. Liao, Business cycles and energy intensity. Evidence from emerging economies, Borsa Istanb. Rev., 22 (2022), 560–570. https://doi.org/10.1016/j.bir.2021.07.005 doi: 10.1016/j.bir.2021.07.005
[7]
S. L. Chen, J. H. Zhong, P. Failler, Does China transmit financial cycle spillover effects to the G7 countries?, Economic Research-Ekonomska Istrazivanja, 35 (2022), 5184–5201. https://doi.org/10.1080/1331677x.2021.2025123 doi: 10.1080/1331677x.2021.2025123
[8]
Y. Liu, Z. H. Li, M. R. Xu, The influential factors of financial cycle spillover: evidence from China, Emerg. Mark. Financ. Tr., 56 (2020), 1336–1350. https://doi.org/10.1080/1540496x.2019.1658076 doi: 10.1080/1540496x.2019.1658076
[9]
F. Mai, S. N. Tian, C. Lee, L. Ma, Deep learning models for bankruptcy prediction using textual disclosures, Eur. J. Oper. Res., 274 (2019), 743–758. https://doi.org/10.1016/j.ejor.2018.10.024 doi: 10.1016/j.ejor.2018.10.024
[10]
D. Qiu, D. Li, Comments on the "SSF Report" from the perspective of economic statistics, Green Finance, 3 (2021), 403–463. https://doi.org/ 10.3934/GF.2021020 doi: 10.3934/GF.2021020
[11]
Z. H. Li, J. H. Zhong, Impact of economic policy uncertainty shocks on China's financial conditions, Financ. Res. Lett., 35 (2020), 101303. https://doi.org/10.1016/j.frl.2019.101303 doi: 10.1016/j.frl.2019.101303
[12]
S. P. Zhao, K. Xu, Z. Wang, C. Liang, W. Lu, B. Chen, Financial distress prediction by combining sentiment tone features, Econ. Model., 106 (2022), 105709. https://doi.org/10.1016/j.econmod.2021.105709 doi: 10.1016/j.econmod.2021.105709
[13]
Z. H. Li, H. Dong, C. Floros, A. Charemis, P. Failler, Re-examining Bitcoin volatility: A CAViaR-based approach, Emerg. Mark. Financ. Tr., 58 (2022), 1320–1338. https://doi.org/10.1080/1540496x.2021.1873127 doi: 10.1080/1540496x.2021.1873127
[14]
Z. Huang, H. Dong, S. Jia, Equilibrium pricing for carbon emission in response to the target of carbon emission peaking, Energ. Econ., 112 (2022), 106160. https://doi.org/10.1016/j.eneco.2022.106160 doi: 10.1016/j.eneco.2022.106160
[15]
T. T. Chen, S. J. Lee, A weighted LS-SVM based learning system for time series forecasting, Inform. Sciences, 299 (2015), 99–116. https://doi.org/10.1016/j.ins.2014.12.031 doi: 10.1016/j.ins.2014.12.031
[16]
E. Dumitrescu, S. Hue, C. Hurlin, S. Tokpavi, Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects, Eur. J. Oper. Res., 297 (2022), 1178–1192. https://doi.org/10.1016/j.ejor.2021.06.053 doi: 10.1016/j.ejor.2021.06.053
[17]
J. M. Liu, S. C. Zhang, H. Y. Fan, A two-stage hybrid credit risk prediction model based on XGBoost and graph-based deep neural network, Expert Syst. Appl., 195 (2022), 116624. https://doi.org/10.1016/j.eswa.2022.116624 doi: 10.1016/j.eswa.2022.116624
[18]
S. Bag, S. Gupta, A. Kumar, U. Sivarajah, An integrated artificial intelligence framework for knowledge creation and B2B marketing rational decision making for improving firm performance, Ind. Market. Manag., 92 (2021), 178–189. https://doi.org/10.1016/j.indmarman.2020.12.001 doi: 10.1016/j.indmarman.2020.12.001
[19]
F. Barboza, H. Kimura, E. Altman, Machine learning models and bankruptcy prediction, Expert Syst. Appl., 83 (2017), 405–417. https://doi.org/10.1016/j.eswa.2017.04.006 doi: 10.1016/j.eswa.2017.04.006
[20]
S. Papadopoulos, C. E. Kontokosta, Grading buildings on energy performance using city benchmarking data, Appl. Energ., 233 (2019), 244–253. https://doi.org/10.1016/j.apenergy.2018.10.053 doi: 10.1016/j.apenergy.2018.10.053
[21]
R. Kellner, M. Nagl, D. Rosch, Opening the black box—Quantile neural networks for loss given default prediction, J. Bank. Financ., 134 (2022), 106334. https://doi.org/10.1016/j.jbankfin.2021.106334 doi: 10.1016/j.jbankfin.2021.106334
[22]
D. Qiu, D. Li, Paradox in deviation measure and trap in method improvement—take international comparison as an example, Quant. Financ. Econ., 5 (2021), 591–603. https://doi.org/10.3934/QFE.2021026 doi: 10.3934/QFE.2021026
[23]
T. Q. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, In: KDD'16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco: Assoc Computing Machinery, 2016,785–794. https://doi.org/10.1145/2939672.2939785
[24]
P. Carmona, F. Climent, A. Momparler, Predicting failure in the U.S. banking sector: an extreme gradient boosting approach, Int. Rev. Econ. Financ., 61 (2019), 304–323. https://doi.org/10.1016/j.iref.2018.03.008 doi: 10.1016/j.iref.2018.03.008
[25]
D. Ardila, A. Ahmed, D. Sornette, Comparing ask and transaction prices in the Swiss housing market, Quant. Financ. Econ., 5 (2021), 67–93. https://doi.org/10.3934/QFE.2021004 doi: 10.3934/QFE.2021004
[26]
Z. H. Li, H. Chen, B. Mo, Can digital finance promote urban innovation? Evidence from China, Borsa Istanb. Rev., in press. https://doi.org/10.1016/j.bir.2022.10.006
[27]
L. Breiman, Random forests, Machine Learning, 45 (2001), 5–32. https://doi.org/10.1023/a:1010933404324 doi: 10.1023/a:1010933404324
[28]
R. Katuwal, P. N. Suganthan, L. Zhang, Heterogeneous oblique random forest, Pattern Recogn., 99 (2020), 107078. https://doi.org/10.1016/j.patcog.2019.107078 doi: 10.1016/j.patcog.2019.107078
[29]
E. I. Altman, M. Iwanicz-Drozdowska, E. K. Laitinen, A. Suvas, Financial distress prediction in an international context: a review and empirical analysis of Altman's Z-score model, J. Int. Fin. Manag. Acc., 28 (2017), 131–171. https://doi.org/10.1111/jifm.12053 doi: 10.1111/jifm.12053
[30]
G. K. Liao, P. Hou, X. Y. Shen, K. Albitar, The impact of economic policy uncertainty on stock returns: the role of corporate environmental responsibility engagement, Int. J. Financ. Econ., 26 (2021), 4386–4392. https://doi.org/10.1002/ijfe.2020 doi: 10.1002/ijfe.2020
[31]
R. B. Geng, I. Bose, X. Chen, Prediction of financial distress: an empirical study of listed Chinese companies using data mining, Eur. J. Oper. Res., 241 (2015), 236–247. https://doi.org/10.1016/j.ejor.2014.08.016 doi: 10.1016/j.ejor.2014.08.016
[32]
X. B. Tang, S. X. Li, M. L. Tan, W. X. Shi, Incorporating textual and management factors into financial distress prediction: a comparative study of machine learning methods, J. Forecasting, 39 (2020), 769–787. https://doi.org/10.1002/for.2661 doi: 10.1002/for.2661
[33]
H. Li, C. J. Li, X. J. Wu, J. Sun, Statistics-based wrapper for feature selection: an implementation on financial distress identification with support vector machine, Appl. Soft Comput., 19 (2014), 57–67. https://doi.org/10.1016/j.asoc.2014.01.018 doi: 10.1016/j.asoc.2014.01.018
[34]
S. Liu, X. Shen, T. Jiang, P. Failler, Impacts of the financialization of manufacturing enterprises on total factor productivity: empirical examination from China's listed companies, Green Finance, 3 (2021), 59–89. https://doi.org/10.3934/GF.2021005 doi: 10.3934/GF.2021005
[35]
Y. Xia, L. He, Y. Li, N. Liu, Y. Ding, Predicting loan default in peer-to-peer lending using narrative data, J. Forecasting, 39 (2020), 260–280. https://doi.org/10.1002/for.2625 doi: 10.1002/for.2625
[36]
E. A. Mohamed, I. E. Ahmed, R. Mehdi, H. Hussain, Impact of corporate performance on stock price predictions in the UAE markets: Neuro-fuzzy model, Intell. Syst. Account., 28 (2021), 52–71. https://doi.org/10.1002/isaf.1484 doi: 10.1002/isaf.1484
[37]
M. Zhou, H. Liu, Y. Hu, Research on corporate financial performance prediction based on self-organizing and convolutional neural networks, Expert Syst., 39 (2022), e13042. https://doi.org/10.1111/exsy.13042 doi: 10.1111/exsy.13042
This article has been cited by:
1.
Zhenzhen Wang, Feite Zhou, Junhao Zhong,
Can China's low-carbon city pilot policy facilitate carbon neutrality? Evidence from a machine learning approach,
2024,
84,
03135926,
756,
10.1016/j.eap.2024.09.028
2.
Early Ridho Kismawadi, Mohammad Irfan, Uun Dwi Al Muddatstsir, Fatima Muhammad Abdulkarim,
2023,
chapter 3,
9798369310380,
35,
10.4018/979-8-3693-1038-0.ch003
3.
Marcos Machado, Joerg Osterrieder, Daniel Chen,
Forecasting Commercial Customers Credit Risk Through Early Warning Signals Data: A Machine Learning based Approach,
2024,
1556-5068,
10.2139/ssrn.4754568
Note: 'Mean', 'Max.', 'Min.', 'Std. Dev.', 'Skew.' and 'Kurt.' denote the average value, maximum value, minimum value, standard deviation, skewness and kurtosis, respectively.
Table 5.
ACC of different prediction models by period; logistic regression: Logistic.
Year
Logistic
Tree
NN
RF
SVM
XGBoost
CatBoost
Mean
Y
0.9647
0.9562
0.9583
0.9569
0.9626
0.9562
0.9605
0.9593
Y−1
0.9647
0.9569
0.9597
0.9583
0.9626
0.9562
0.9633
0.9603
Y−2
0.9612
0.9541
0.9583
0.9612
0.9626
0.9619
0.9612
0.9600
Y−3
0.9619
0.9626
0.9400
0.9569
0.9626
0.9562
0.9626
0.9575
Y−1, Y−2
0.9612
0.9513
0.9520
0.9619
0.9626
0.9605
0.9647
0.9591
Y−1, Y−2, Y−3
0.9583
0.9548
0.9548
0.9682*
0.9626
0.9682*
0.9675
0.9621
Mean
0.9620
0.9560
0.9539
0.9606
0.9626
0.9599
0.9633
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Table 6.
AUC-ROC calculated by type of model and period.
Year
Logistic
Tree
NN
RF
SVM
XGBoost
CatBoost
Mean
Y
0.8018
0.7800
0.6449
0.8711
0.7618
0.8257
0.8529
0.7912
Y−1
0.8089
0.7794
0.7693
0.8660
0.7932
0.8415
0.8467
0.8150
Y−2
0.7764
0.6351
0.7995
0.8645
0.8219
0.8631
0.8704
0.8044
Y−3
0.7723
0.7420
0.6120
0.8291
0.7360
0.8100
0.8202
0.7602
Y−1, Y−2
0.8304
0.8051
0.6541
0.9149
0.8423
0.8655
0.8827
0.8279
Y−1, Y−2, Y−3
0.7990
0.8030
0.7765
0.9390*
0.8418
0.9194
0.9155
0.8563
Mean
0.7981
0.7574
0.7094
0.8808
0.7995
0.8542
0.8647
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Net cash flow from operations / Current liabilities
Development ability
X10
Year-on-year growth rate of operating revenue
Increase in current year's operating income / Total previous year's operating income
X11
Year-on-year growth rate of operating profit
Current year profit growth / Total previous year profit
X12
Year-on-year growth rate of operating cash flow
Current year operating net cash flow growth / Total previous year operating net cash flow
X13
Growth rate of net assets
Current year net assets growth / Total previous year net assets
X14
Growth rate of total assets
Current year total assets growth / Total previous year total assets
Operating ability
X15
Account receivable turnover
Operating income / Average accounts receivable
X16
Current asset turnover
Operating income / Average net current assets
X17
Fixed assets turnover
Operating income / Average net fixed assets
X18
Total assets turnover
Operating income / Average total assets
Cash flow ability
X19
Sales-to-cash flow ratio
Net cash flow from operations / Operating income
X20
Operating income cash ratio
Cash received from selling goods and providing services / Operating income
X21
Cash recovery ratio of total assets
Net cash flow from operations / Total assets
Capital structure
X22
Asset-liability ratio
Total liabilities / Total assets
X23
Current assets ratio
Current assets / Total assets
X24
Fixed assets ratio
Fixed assets / Total assets
X25
Current debt ratio
Current debt / Total liabilities
X26
Debt-to-long capital ratio
Total long-term liabilities / (Total long-term liabilities + shareholders' equity)
Mean
Max.
Min.
Std. Dev.
Skew.
Kurt.
X1
6.3313
5400.1671
−1277.1794
62.5843
65.3038
5699.0080
X2
3.9559
81.5616
−97.5232
8.3262
−1.6777
17.4696
X3
2.6982
156.3490
−5442.7347
76.1195
−51.4017
3291.6990
X4
21.5984
1750.3709
−47.9374
27.3027
32.2398
1789.8810
X5
2.3846
80.6637
0.0715
2.8005
8.7953
145.8414
X6
1.8766
60.4173
0.0484
2.4454
8.2501
119.3816
X7
1.2107
194.0507
−181.8149
4.9871
14.0926
742.7602
X8
0.1930
5.0480
−11.7356
0.3834
−0.3168
124.8073
X9
0.2343
6.7687
−15.3086
0.4877
−1.0176
139.3556
X10
19.9843
8269.9179
−98.8233
137.1655
39.9626
1994.5710
X11
39.6931
1398353
−300540
14905.2300
83.3389
7974.3380
X12
139.6043
145302.7150
−66573.7540
2998.6370
26.1186
1282.3700
X13
19.6424
17632.2353
−98.5955
221.7191
60.5292
4381.8630
X14
14.2982
1839.0951
−92.9032
48.0487
17.4981
490.9641
X15
40.0740
20873.5102
0.0353
442.8323
28.0359
955.5722
X16
1.2767
12.0655
0.0061
0.8783
2.8939
18.3175
X17
14.6846
55200.7412
0.0213
631.1668
78.1708
6431.8200
X18
0.6724
7.7880
0.0060
0.4195
3.0022
24.3635
X19
8.2708
161.9600
−782.1000
22.9947
−14.6610
426.9572
X20
96.7410
663.5000
8.5200
22.4384
3.7638
70.6896
X21
5.2778
92.0075
−70.3538
7.3987
−0.1228
10.5305
X22
41.3800
105.8638
0.8359
19.0591
0.2962
2.5969
X23
56.5613
99.8423
4.0784
16.6003
−0.1688
2.6022
X24
43.4387
95.9216
0.1577
16.6003
0.1688
2.6022
X25
83.4102
121.7465
3.7539
14.8272
−1.3100
4.6938
X26
13.0065
99.1000
−23.2981
14.9778
2.0032
8.2377
Note: 'Mean', 'Max.', 'Min.', 'Std. Dev.', 'Skew.' and 'Kurt.' denote the average value, maximum value, minimum value, standard deviation, skewness and kurtosis, respectively.
Year
Logistic
Tree
NN
RF
SVM
XGBoost
CatBoost
Mean
Y
0.9647
0.9562
0.9583
0.9569
0.9626
0.9562
0.9605
0.9593
Y−1
0.9647
0.9569
0.9597
0.9583
0.9626
0.9562
0.9633
0.9603
Y−2
0.9612
0.9541
0.9583
0.9612
0.9626
0.9619
0.9612
0.9600
Y−3
0.9619
0.9626
0.9400
0.9569
0.9626
0.9562
0.9626
0.9575
Y−1, Y−2
0.9612
0.9513
0.9520
0.9619
0.9626
0.9605
0.9647
0.9591
Y−1, Y−2, Y−3
0.9583
0.9548
0.9548
0.9682*
0.9626
0.9682*
0.9675
0.9621
Mean
0.9620
0.9560
0.9539
0.9606
0.9626
0.9599
0.9633
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Year
Logistic
Tree
NN
RF
SVM
XGBoost
CatBoost
Mean
Y
0.8018
0.7800
0.6449
0.8711
0.7618
0.8257
0.8529
0.7912
Y−1
0.8089
0.7794
0.7693
0.8660
0.7932
0.8415
0.8467
0.8150
Y−2
0.7764
0.6351
0.7995
0.8645
0.8219
0.8631
0.8704
0.8044
Y−3
0.7723
0.7420
0.6120
0.8291
0.7360
0.8100
0.8202
0.7602
Y−1, Y−2
0.8304
0.8051
0.6541
0.9149
0.8423
0.8655
0.8827
0.8279
Y−1, Y−2, Y−3
0.7990
0.8030
0.7765
0.9390*
0.8418
0.9194
0.9155
0.8563
Mean
0.7981
0.7574
0.7094
0.8808
0.7995
0.8542
0.8647
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Year
Logistic
Tree
NN
RF
SVM
XGBoost
CatBoost
Mean
Y
0.9401
0.9266
0.9401
0.9774
0.9597
0.9645
0.9368
0.9493
Y−1
0.9384
0.9253
0.9409
0.9797
0.9562
0.9681
0.9374
0.9494
Y−2
0.9416
0.9374
0.9398
0.9803
0.9626
0.9717
0.9567
0.9557
Y−3
0.9314
0.9253
0.9302
0.9694
0.9562
0.9681
0.9332
0.9448
Y−1, Y−2
0.9442
0.9386
0.9467
0.9827
0.9562
0.9717
0.9554
0.9565
Y−1, Y−2, Y−3
0.9525
0.9488
0.9375
0.9875*
0.9375
0.9825
0.9663
0.9589
Mean
0.9414
0.9337
0.9392
0.9795
0.9547
0.9711
0.9476
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Year
Logistic
Tree
NN
RF
SVM
XGBoost
CatBoost
Mean
Y
0.7810
0.8051
0.7324
0.9658
0.7629
0.9370
0.9131
0.8425
Y−1
0.7876
0.8252
0.8082
0.9674
0.7856
0.9475
0.9297
0.8645
Y−2
0.8530
0.8385
0.8080
0.9772
0.8184
0.9574
0.9404
0.8847
Y−3
0.8044
0.8118
0.7779
0.9643
0.7372
0.9578
0.9001
0.8505
Y−1, Y−2
0.8584
0.8779
0.8710
0.9730
0.8377
0.9515
0.9504
0.9028
Y−1, Y−2, Y−3
0.8572
0.8788
0.8600
0.9901*
0.8418
0.9874
0.9662
0.9116
Mean
0.8236
0.8396
0.8096
0.9730
0.7973
0.9564
0.9333
Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.
Abbreviation
Explanation
Abbreviation
Explanation
AI
Artificial intelligence
NN
Neural network
ACC
Accuracy
RF
Random forest
AUC
Area under receiver operating characteristic curve
ROC
Receiver operating characteristic
CatBoost
Categorical boosting
Skew.
Skewness
Kurt.
Kurtosis
ST
Special treatment
Logistic
Logistic regression
Std. Dev.
Standard deviation
Max.
Maximum value
SVM
Support vector machines
Figure 1. Comparison of different models using the ROC at the current period
Figure 2. Comparison of different models using the ROC at the 1-year horizon
Figure 3. Comparison of different models using the ROC at the 2-year horizon
Figure 4. Comparison of different models using the ROC at the 3-year horizon
Figure 5. Comparison of different models using the ROC at the 1-2-year horizon
Figure 6. Comparison of different models using the ROC at the 1-3-year horizon
Figure 7. Ten most important financial indicators at the 1-3-year horizon