Research article Special Issues

Artificial intelligence techniques for financial distress prediction

  • Artificial intelligence (AI) models can effectively identify the financial risks existing in Chinese manufacturing enterprises. We use the financial ratios of 1668 Chinese A-share listed manufacturing enterprises from 2016 to 2021 for our empirical analysis. An AI model is used to obtain the financial distress prediction value for the listed manufacturing enterprises. Our results show that the random forest model has high accuracy in terms of the empirical prediction of the financial distress of Chinese manufacturing enterprises, which reflects the effectiveness of the AI model in predicting the financial distress of the listed manufacturing enterprises. Profitability has the highest degree of importance for predicting financial distress in manufacturing firms, especially the return on equity. The results in this paper have good policy implications for how to use the AI model to improve the early warning and monitoring system of financial risks and enhance the ability of financial risk prevention and control.

    Citation: Junhao Zhong, Zhenzhen Wang. Artificial intelligence techniques for financial distress prediction[J]. AIMS Mathematics, 2022, 7(12): 20891-20908. doi: 10.3934/math.20221145

    Related Papers:

    [1] Pedro J. Gutiérrez-Diez, Jorge Alves-Antunes . Stock market uncertainty determination with news headlines: A digital twin approach. AIMS Mathematics, 2024, 9(1): 1683-1717. doi: 10.3934/math.2024083
    [2] Muhammad Danish Zia, Esmail Hassan Abdullatif Al-Sabri, Faisal Yousafzai, Murad-ul-Islam Khan, Rashad Ismail, Mohammed M. Khalaf . A study of quadratic Diophantine fuzzy sets with structural properties and their application in face mask detection during COVID-19. AIMS Mathematics, 2023, 8(6): 14449-14474. doi: 10.3934/math.2023738
    [3] Ilyos Abdullayev, Elvir Akhmetshin, Irina Kosorukova, Elena Klochko, Woong Cho, Gyanendra Prasad Joshi . Modeling of extended osprey optimization algorithm with Bayesian neural network: An application on Fintech to predict financial crisis. AIMS Mathematics, 2024, 9(7): 17555-17577. doi: 10.3934/math.2024853
    [4] Khaled Tarmissi, Hanan Abdullah Mengash, Noha Negm, Yahia Said, Ali M. Al-Sharafi . Explainable artificial intelligence with fusion-based transfer learning on adverse weather conditions detection using complex data for autonomous vehicles. AIMS Mathematics, 2024, 9(12): 35678-35701. doi: 10.3934/math.20241693
    [5] Suyan Tan, Yilin Guo . A study of the impact of scientific collaboration on the application of Large Language Model. AIMS Mathematics, 2024, 9(7): 19737-19755. doi: 10.3934/math.2024963
    [6] Kottakkaran Sooppy Nisar, Muhammad Shoaib, Muhammad Asif Zahoor Raja, Yasmin Tariq, Ayesha Rafiq, Ahmed Morsy . Design of neural networks for second-order velocity slip of nanofluid flow in the presence of activation energy. AIMS Mathematics, 2023, 8(3): 6255-6277. doi: 10.3934/math.2023316
    [7] Jian Qi . Artificial intelligence-based intelligent computing using circular q-rung orthopair fuzzy information aggregation. AIMS Mathematics, 2025, 10(2): 3062-3094. doi: 10.3934/math.2025143
    [8] Iqbal M. Batiha, Reyad El-Khazali, Osama Y. Ababneh, Adel Ouannas, Radwan M. Batyha, Shaher Momani . Optimal design of $ PI^\rho D^\mu $-controller for artificial ventilation systems for COVID-19 patients. AIMS Mathematics, 2023, 8(1): 657-675. doi: 10.3934/math.2023031
    [9] Zimei Huang, Zhenghui Li . The impact of digital economy on the financial risk ripple effect: evidence from China. AIMS Mathematics, 2024, 9(4): 8920-8939. doi: 10.3934/math.2024435
    [10] A. Presno Vélez, M. Z. Fernández Muñiz, J. L. Fernández Martínez . Enhancing structural health monitoring with machine learning for accurate prediction of retrofitting effects. AIMS Mathematics, 2024, 9(11): 30493-30514. doi: 10.3934/math.20241472
  • Artificial intelligence (AI) models can effectively identify the financial risks existing in Chinese manufacturing enterprises. We use the financial ratios of 1668 Chinese A-share listed manufacturing enterprises from 2016 to 2021 for our empirical analysis. An AI model is used to obtain the financial distress prediction value for the listed manufacturing enterprises. Our results show that the random forest model has high accuracy in terms of the empirical prediction of the financial distress of Chinese manufacturing enterprises, which reflects the effectiveness of the AI model in predicting the financial distress of the listed manufacturing enterprises. Profitability has the highest degree of importance for predicting financial distress in manufacturing firms, especially the return on equity. The results in this paper have good policy implications for how to use the AI model to improve the early warning and monitoring system of financial risks and enhance the ability of financial risk prevention and control.



    Nowadays, the number of business problems, or even bankruptcy, caused by the company's financial distress is increasing under the backdrop of great downward pressure on the economy. The financial crisis of listed companies will not only bring huge investment risks to financial investors, creditors and investors, but it will also produce a series of chain reactions, which makes the standardization of the securities market and the governance ability of listed companies questionable. The research on financial risk prediction of listed companies helps to avoid investment risks in a timely manner, protects the rights and interests of stakeholders and has a positive role in reshaping investor confidence. Chinese president Xi Jinping has repeatedly stressed that we must deeply grasp the characteristics of the artificial intelligence development, strengthen the integration of the artificial intelligence and industrial development and provide new momentum for high-quality development. The application of the artificial intelligence technology to improve the predictive effect of financial distress of manufacturing enterprises can enable the intelligent supervision and high-quality development of technology-enabled finance and prevent systemic financial risks.

    In the existing literature, the model used for financial distress prediction mainly focuses on the calculation of default probability and bankruptcy prediction to estimate financial risk. Altman [1] pioneered the use of financial ratios to predict corporate bankruptcy. Since then, most research has used financial ratios to predict financial distress. Classical statistical models to calculate the default probability or default premium include linear probability, logit, probit and linear discriminant analysis (Z-score) models [2,3,4]. What these models have in common is the inclusion of highly correlated financial indicators, such as the use of a combination of five (or seven) financial ratios to accurately predict corporate bankruptcy [5,6,7,8]. The specific difference between the above methods is that the linear probability model is based on linear regression technology to predict financial risk. In the logistic model, the residuals follow a standard logistic distribution, while in the probit model, the residuals follow a standard normal distribution. Linear discriminant analysis produces an overall discriminant score by means of a multivariate discriminant model. These classical statistical models are still widely used in the 1960s because their formulas are simple and easy to understand and they are not difficult to implement operationally [9,10,11]. However, the linearity, normality and independence assumptions of statistical models are difficult to satisfy at the same time, so there are natural flaws in their validity and applicability [5,12,13,14].

    Monitoring methods such as artificial intelligence and data mining have developed rapidly in recent years. The use of the artificial intelligence models such as support vector machines [15], decision trees [16] and neural networks [17] to predict corporate financial distress has gradually become the mainstream. Compared with traditional statistical models, artificial intelligence models do not make strictly restrictive assumptions about the distribution of data, and they can also effectively handle big data [12,18]. The stability, accuracy and applicability of the forecast results have been greatly improved [19]. While there are many artificial intelligence models used to predict corporate financial distress, a system of indicators with multiple data types can make a single classification method perform poorly. More specifically, single decision trees and neural networks tend to underperform ensemble learning methods when dealing with multiple data types. In addition, the principle of neural network modeling is like putting monitoring indicators into a black box and then setting the hidden layer and node number to get monitoring results, which makes the interpretation of neural networks difficult [20,21,22]. Since Chen and Guestrin [23] proposed extreme gradient boosting (XGBoost), empirical studies on bankruptcy prediction have achieved better prediction results. Carmona et al. [24] used the XGBoost algorithm to predict bank failure and found that it has better discrimination performance in predicting financial distress than other methods. Ensemble learning can use some (different) means to change the distribution of the original training samples so as to build multiple different classifiers, as well as a linear combination of these classifiers to obtain a more powerful classifier to make the final decision [12]. Many studies have compared the performance of various traditional and artificial intelligence models, such as logit regression, probit regression, linear discriminant analysis, decision tree, neural network, support vector machine, random forest and ensemble learning models [2,12,23,25,26]. Ensemble learning avoids overfitting, has good generalization performance and shows the best prediction performance, especially when the gradient boosting tree is applied in an ensemble learning model.

    At present, there is little literature on the application of an artificial intelligence model for the prediction of the financial distress of Chinese manufacturing enterprises. How can the accuracy of the financial distress prediction of Chinese manufacturing enterprises be improved? To answer the above question more accurately, we plan to use the financial ratios of 1, 668 Chinese A-share listed manufacturing enterprises from 2016 to 2021 for our empirical analysis. An artificial intelligence model was used to obtain the financial distress prediction values for the listed manufacturing enterprises. Our results can provide an adequate early warning of the financial risks of Chinese manufacturing enterprises, as well as help the financial management departments and provincial governments to strengthen the monitoring and early warning of financial risks in the manufacturing industry. Compared with the existing literature, the marginal contributions of this paper are mainly reflected in the following two points. First, we used seven statistical models and artificial intelligence models to predict the financial distress of the listed manufacturing enterprises. From logistic regression, neural network, decision tree, random forest, support vector machine, XGBoost and categorical boosting (CatBoost) models, the most suitable model for the financial distress prediction of China's listed manufacturing enterprises is selected. We provide evidence for the selection of a financial distress prediction model for manufacturing enterprises and provide a new empirical basis for the application of an artificial intelligence model in China's financial risk monitoring and early warning. Second, our empirical study of financial distress prediction of listed manufacturing firms extends the related research on corporate risk and intelligent regulation.

    The remainder of the paper is organized as follows. Section 2 presents a financial distress prediction framework with an artificial intelligence model. We describe the random forest model and the model evaluation method. Section 3 presents an empirical study based on the sample data. We show the predictive performance and important indicators. Section 4 concludes the paper with conclusions and policy implications.

    We have chosen the random forest model to predict financial distress for the following reasons:

    First, artificial intelligence models do not make strictly restrictive assumptions about the distribution of data as compared with traditional statistical models. The stability, accuracy and applicability of the forecast results have been greatly improved [19]. Single decision trees and neural networks tend to underperform ensemble learning methods when dealing with multiple data types. In addition, the principle of the neural network is like putting monitoring indicators into a black box and then setting the hidden layer and node number to get monitoring results, which makes the interpretation of the neural network difficult [20,21,22]. Ensemble learning can use some (different) means to change the distribution of the original training samples to build multiple different classifiers; it can also use the linear combination of these classifiers to obtain a more powerful classifier to make the final decision [12].

    Second, the random forest model is being favored by more and more scholars. The random forest is the most common and widely used supervised machine learning method in the ensemble learning algorithm. Moreover, according to the existing research, the random forest model performs best in machine learning models, and the prediction accuracy is above 85% [19]. At the same time, the random forest model not only has high prediction accuracy, but it can also balance the error. Since companies in financial distress account for a small proportion of the overall sample, the sample of this study is unbalanced. The random forest model can balance the errors and is suitable for our study.

    Third, after training the samples, the random forest model can evaluate each feature by using the Gini coefficient. This feature reduces the black box nature of machine learning methods.

    The random forest is a popular machine learning model that is widely used in credit scoring and bankruptcy prediction. The random forest was first proposed by Breiman [27], and it is a classifier containing multiple decision trees. The basic idea of a random forest is to combine multiple weak decision trees into a strong decision tree to improve the performance of a single decision tree. Then, the mode of feature classification is obtained by voting for each tree so as to predict the final result. Based on the setting utilized by Katuwal et al. [28], we set the random forest model as follows:

    (1)

    where is the probability distribution of each tree .

    The random forest is compared with benchmark methods, namely, logistic regression, neural networks, decision trees, support vector machines, XGBoost and CatBoost, to verify its actual prediction performance.

    Logistic regression, neural networks, decision trees, random forests, support vector machines, XGBoost and CatBoost all relied on R software for model construction. In terms of parameter selection, the parameters of the logistic regression and decision tree were set to default values. However, in order to improve performance classification and avoid overfitting problems, we performed hyperparameter tuning on the neural networks, random forests, support vector machines, XGBoost and CatBoost by using grid search and cross-validation. Grid search is a parameter optimization technique, which is widely used in the parameter tuning of the artificial intelligence models. The optimal parameter settings of the artificial intelligence model described in this paper are shown in Table 1.

    Table 1.  Tuning hyperparameters for models; neural network: NN, random forest: RF, decision tree: Tree, support vector machine: SVM.
    Model Initial parameters Optimal parameters
    NN size=(10, 11, 12, 13, 14, 15) size=10
    decay=(0.01, 0.05, 0.1) decay=0.01
    RF mtry=(2, 14, 26) mtry=14
    SVM sigma=0.068 sigma=0.068
    cost=(0.25, 0.5, 1, 2, 4, 8, 16) cost=1
    XGBoost max_depth= (4, 6, 8) max_depth=6
    eta=(0.1, 0.5) eta=0.5
    nround=25 nround=25
    CatBoost iterations=100 iterations=100
    thread_count=10 thread_count=10
    border_count=(32, 64) border_count=32
    depth= (4, 6, 8) depth=6

     | Show Table
    DownLoad: CSV

    We use accuracy (ACC) and area under receiver operating characteristic curve (AUC-ROC) to compare the predictive power of the logistic regression, neural network, decision tree, random forest, support vector machine, XGBoost and CatBoost models for financial distress. AUC can be understood as the probability that the model predicts that the true positive (TP) rate is greater than the false positive (FP) rate when one positive sample and one negative sample are taken at random. Before defining ACC and AUC, the TP, FP, true negative (TN), false negative (FN), TP rate (TPR), FP rate (FPR), TN rate (TNR) and FN rate (FNR) were defined according to the confusion matrix shown in Table 2.

    Table 2.  Confusion matrix.
    1 0 Total
    1 TP FP TP + FP
    0 FN TN FN+TN
    Total TP + FN FP + TN TP + TN + FP + FN

     | Show Table
    DownLoad: CSV

    Based on Table 2, the definitions of TPR, FPR, TNR, FNR, ACC and AUC are as follows:

    (2)
    (3)
    (4)
    (5)
    (6)
    (7)

    According to the actual situation of financial distress in China's manufacturing industry and scholars' attention to the financial indicators of the manufacturing industry [2,9,29,30], we divided the collected financial ratios into six dimensions to predict the financial distress of the manufacturing industry, namely, profitability, solvency, development ability, operating ability, cash flow ability and capital structure, as shown in Table 3.

    Table 3.  Definitions of explanatory variables.
    Financial dimension Variable Definition Formula
    Profitability X1 Return on equity Net income / Shareholders' equity
    X2 Return on assets Net income / Total assets
    X3 Net profit margin Net income / Operating revenue
    X4 Expense to sales Sales expenses / Operating revenue
    Solvency X5 Current ratio Current assets / Current liabilities
    X6 Liquidity ratio Current asset stocks / Current liabilities
    X7 Equity ratio Total liabilities / Shareholders' equity
    X8 Operating cash flow-to-debt ratio Net cash flow from operations / Total debt
    X9 Operating cash flow-to-current liabilities ratio Net cash flow from operations / Current liabilities
    Development ability X10 Year-on-year growth rate of operating revenue Increase in current year's operating income / Total previous year's operating income
    X11 Year-on-year growth rate of operating profit Current year profit growth / Total previous year profit
    X12 Year-on-year growth rate of operating cash flow Current year operating net cash flow growth / Total previous year operating net cash flow
    X13 Growth rate of net assets Current year net assets growth / Total previous year net assets
    X14 Growth rate of total assets Current year total assets growth / Total previous year total assets
    Operating ability X15 Account receivable turnover Operating income / Average accounts receivable
    X16 Current asset turnover Operating income / Average net current assets
    X17 Fixed assets turnover Operating income / Average net fixed assets
    X18 Total assets turnover Operating income / Average total assets
    Cash flow ability X19 Sales-to-cash flow ratio Net cash flow from operations / Operating income
    X20 Operating income cash ratio Cash received from selling goods and providing services / Operating income
    X21 Cash recovery ratio of total assets Net cash flow from operations / Total assets
    Capital structure X22 Asset-liability ratio Total liabilities / Total assets
    X23 Current assets ratio Current assets / Total assets
    X24 Fixed assets ratio Fixed assets / Total assets
    X25 Current debt ratio Current debt / Total liabilities
    X26 Debt-to-long capital ratio Total long-term liabilities / (Total long-term liabilities + shareholders' equity)

     | Show Table
    DownLoad: CSV

    According to the industry classification guidelines of listed companies issued by the China Securities Regulatory Commission (CSRC) in 2012, we selected 1, 668 Chinese A-share listed manufacturing enterprises as samples after eliminating samples with considerable missing data. We applied the artificial intelligence model introduced in Section 2 to predict the financial distress of manufacturing enterprises. The sample data were annual data from 2016 to 2021. The financial data of the listed manufacturing enterprises were mainly from the Wind database. In general, financial distress leads to an erosion of a firm's profitability. Much of the research in this area has looked at bankruptcy as a result of financial distress. However, data related to the bankruptcies of listed companies in China are difficult to obtain. As an alternative, the definition of the "Special Treatment" (ST) of a company, as presented by the CSRC, may be seen as close to describing financial distress [31,32,33,34]. We define whether a manufacturing firm is in financial distress in that year based on the timing of ST or delisting. The year of ST or delisting is recorded as the year when the listed manufacturing enterprise is in financial distress. After eliminating the missing values, there were still 10, 007 observed values for the 1, 668 Chinese A-share listed manufacturing enterprises, among which 509 samples were in financial distress, accounting for 5.09% of the total sample. Using random sampling, 70% of the sample was used as the training set and 30% as the test set. The descriptive statistics of each indicator are shown in Table 4.

    Table 4.  Descriptive statistics.
    Mean Max. Min. Std. Dev. Skew. Kurt.
    X1 6.3313 5400.1671 −1277.1794 62.5843 65.3038 5699.0080
    X2 3.9559 81.5616 −97.5232 8.3262 −1.6777 17.4696
    X3 2.6982 156.3490 −5442.7347 76.1195 −51.4017 3291.6990
    X4 21.5984 1750.3709 −47.9374 27.3027 32.2398 1789.8810
    X5 2.3846 80.6637 0.0715 2.8005 8.7953 145.8414
    X6 1.8766 60.4173 0.0484 2.4454 8.2501 119.3816
    X7 1.2107 194.0507 −181.8149 4.9871 14.0926 742.7602
    X8 0.1930 5.0480 −11.7356 0.3834 −0.3168 124.8073
    X9 0.2343 6.7687 −15.3086 0.4877 −1.0176 139.3556
    X10 19.9843 8269.9179 −98.8233 137.1655 39.9626 1994.5710
    X11 39.6931 1398353 −300540 14905.2300 83.3389 7974.3380
    X12 139.6043 145302.7150 −66573.7540 2998.6370 26.1186 1282.3700
    X13 19.6424 17632.2353 −98.5955 221.7191 60.5292 4381.8630
    X14 14.2982 1839.0951 −92.9032 48.0487 17.4981 490.9641
    X15 40.0740 20873.5102 0.0353 442.8323 28.0359 955.5722
    X16 1.2767 12.0655 0.0061 0.8783 2.8939 18.3175
    X17 14.6846 55200.7412 0.0213 631.1668 78.1708 6431.8200
    X18 0.6724 7.7880 0.0060 0.4195 3.0022 24.3635
    X19 8.2708 161.9600 −782.1000 22.9947 −14.6610 426.9572
    X20 96.7410 663.5000 8.5200 22.4384 3.7638 70.6896
    X21 5.2778 92.0075 −70.3538 7.3987 −0.1228 10.5305
    X22 41.3800 105.8638 0.8359 19.0591 0.2962 2.5969
    X23 56.5613 99.8423 4.0784 16.6003 −0.1688 2.6022
    X24 43.4387 95.9216 0.1577 16.6003 0.1688 2.6022
    X25 83.4102 121.7465 3.7539 14.8272 −1.3100 4.6938
    X26 13.0065 99.1000 −23.2981 14.9778 2.0032 8.2377
    Note: 'Mean', 'Max.', 'Min.', 'Std. Dev.', 'Skew.' and 'Kurt.' denote the average value, maximum value, minimum value, standard deviation, skewness and kurtosis, respectively.

     | Show Table
    DownLoad: CSV

    We will respectively compare the predictive performance of the logistic regression, neural network, decision tree, random forest, support vector machine, XGBoost and CatBoost models in terms of the financial distress of listed manufacturing firms. We evaluate the predictive performance of the model by using two evaluation metrics: ACC and AUC-ROC. Tables 5 and 6 show the ACC and AUC results of these seven financial distress prediction models that used financial indicators with different numbers of early warning periods to predict financial distress of the listed manufacturing enterprises. Tables 5 and 6 show that the warning lead time is 1 year, which means that the prediction result of whether to be in financial distress in year Y is obtained from the data in year Y−1, and the rest is the same.

    Table 5.  ACC of different prediction models by period; logistic regression: Logistic.
    Year Logistic Tree NN RF SVM XGBoost CatBoost Mean
    Y 0.9647 0.9562 0.9583 0.9569 0.9626 0.9562 0.9605 0.9593
    Y−1 0.9647 0.9569 0.9597 0.9583 0.9626 0.9562 0.9633 0.9603
    Y−2 0.9612 0.9541 0.9583 0.9612 0.9626 0.9619 0.9612 0.9600
    Y−3 0.9619 0.9626 0.9400 0.9569 0.9626 0.9562 0.9626 0.9575
    Y−1, Y−2 0.9612 0.9513 0.9520 0.9619 0.9626 0.9605 0.9647 0.9591
    Y−1, Y−2, Y−3 0.9583 0.9548 0.9548 0.9682* 0.9626 0.9682* 0.9675 0.9621
    Mean 0.9620 0.9560 0.9539 0.9606 0.9626 0.9599 0.9633
    Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.

     | Show Table
    DownLoad: CSV
    Table 6.  AUC-ROC calculated by type of model and period.
    Year Logistic Tree NN RF SVM XGBoost CatBoost Mean
    Y 0.8018 0.7800 0.6449 0.8711 0.7618 0.8257 0.8529 0.7912
    Y−1 0.8089 0.7794 0.7693 0.8660 0.7932 0.8415 0.8467 0.8150
    Y−2 0.7764 0.6351 0.7995 0.8645 0.8219 0.8631 0.8704 0.8044
    Y−3 0.7723 0.7420 0.6120 0.8291 0.7360 0.8100 0.8202 0.7602
    Y−1, Y−2 0.8304 0.8051 0.6541 0.9149 0.8423 0.8655 0.8827 0.8279
    Y−1, Y−2, Y−3 0.7990 0.8030 0.7765 0.9390* 0.8418 0.9194 0.9155 0.8563
    Mean 0.7981 0.7574 0.7094 0.8808 0.7995 0.8542 0.8647
    Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.

     | Show Table
    DownLoad: CSV

    First, the CatBoost model has the highest average ACC (96.33%) for financial distress prediction in different warning periods. The random forest has the highest average AUC (0.8808) for financial distress prediction in different warning periods. All models had an average ACC of more than 95% for financial distress predictions. In addition, the random forest, XGBoost, and CatBoost models had average AUC values of over 0.85. Similar to the results obtained in existing literature, the random forest and gradient boosting algorithms showed the best prediction performance for a single model prediction framework. For example, Ben Jabeur et al. [2] compared multiple models to predict the financial distress of French enterprises; they found that only the random forest, XGBoost and CatBoost models achieved an average accuracy of more than 80%. Xia et al. [35] used linear regression, random forest and gradient boosting algorithms to score the credit of P2P lending customers, and the results showed that the prediction performance of the random forest and CatBoost models was better than that of traditional statistical models.

    Second, in terms of the number of early warning periods, the financial indicators of 1-3-year were found to have the best early warning effect for the financial distress of the listed manufacturing enterprises. Specifically, the average ACC and average AUC of financial indicators with an early warning lead period of 1-3-year for financial distress prediction were both the maximum values, i.e., 96.21% and 0.8563, respectively. In addition, the average accuracy of all warning lead periods was over 95%. The accuracy of the prediction model constructed in this study is consistent with the performance of the model accuracy in most reports, which shows a decrease with the increase in the number of early warning periods [2].

    Finally, the ROC curve was drawn based on the results of the TPR (also known as sensitivity) and TNR (also known as specificity) obtained by setting different classification thresholds. Figures 1-6 show the ROC curves of the seven financial distress prediction models at different warning periods. It can be ascertained from the ROC curve that the performance of the financial distress prediction model is obviously divided into three levels. The first level consists of random forest, XGBoost and CatBoost algorithms. The second level is composed of logistic regression, decision tree and support vector machine algorithms. The third level is a neural network.

    Figure 1.  Comparison of different models using the ROC at the current period.
    Figure 2.  Comparison of different models using the ROC at the 1-year horizon.
    Figure 3.  Comparison of different models using the ROC at the 2-year horizon.
    Figure 4.  Comparison of different models using the ROC at the 3-year horizon.
    Figure 5.  Comparison of different models using the ROC at the 1-2-year horizon.
    Figure 6.  Comparison of different models using the ROC at the 1-3-year horizon.

    In general, the random forest model has high effectiveness in terms of predicting the financial distress of listed manufacturing enterprises based on the data of a 1-3-year early warning period, with the ACC and AUC both being the maximum values, i.e., 96.82% and 0.9390, respectively. Specificity and sensitivity also confirmed excellent performance on different classification thresholds. Therefore, the results show that the random forest model has a certain level of effectiveness in terms of predicting the financial distress of the listed manufacturing enterprises based on the data of a 1-3-year early warning period. This also shows that the data of a 1-3-year early warning period is more consistent with the financial distress warning period of China's listed manufacturing enterprises, while the random forest model can better describe the impact of financial indicators on financial distress; the combination of the two can achieve better warning effect.

    The importance of indicators can reflect the contribution of a single financial indicator to the prediction of the financial distress of listed manufacturing enterprises. Figure 7 depicts the 10 most important financial indicators when the random forest model uses the data of the 1-3-year warning period to predict the financial distress of listed manufacturing enterprises. We calculated the degree of accuracy decline after removing a certain indicator as the importance result of the indicator.

    Figure 7.  Ten most important financial indicators at the 1-3-year horizon.

    It can be seen that, when predicting the financial distress of China's listed manufacturing enterprises, the profitability of manufacturing enterprises needs special emphasis, especially, the return on equity (X1). This result is similar to the conclusion of Mohamed et al. [36]. Their study used an adaptive neuro-fuzzy model to investigate the relationship between a firm's performance indicators and share price prediction; they found that the return on equity was the most significant predictor. Profitability (X1, X2, X3 and X4) is the most important of all dimensions for predicting financial distress. Excluding the indicators of the profitability dimension, the accuracy of the model prediction will decrease by 5.70%. The reason may be as stated by Zhou et al. [37]: enterprises with poor profitability lead to low financial performance, greater operational risks of enterprises and limited growth and development. Return on equity at the 2-year horizon (lag2_X1) had the highest importance (1.17%) among all indicators of different warning periods. The second is the return on assets at the 2-year horizon (lag2_X2), which had a significance of 0.80%. The net profit margin at the 1-year horizon (lag_X3) ranked third in importance, with a value of 0.75%. In terms of a single indicator, return on equity (X1) had the largest contribution to the financial distress prediction of the listed manufacturing enterprises, reaching 1.91%. The above results show that manufacturing enterprises are in financial distress mainly due to a low profit margin. The return on equity (X1) and other indicators reflecting profitability can effectively predict the current profits and losses of manufacturing enterprises, and they can also predict the probability of the future financial distress of manufacturing enterprises to a large extent. In recent years, China's financing policy has continued to tighten, and the financing cost of manufacturing enterprises has risen. During the COVID-19 epidemic, prevention and control measures led to the shutdown of most manufacturing enterprises, which seriously hindered the normal production and operation of the company and made the company face many risks, such as supply chain disruption. This has also become the main reason why most manufacturing companies are in financial trouble. The importance of return on equity (X1), return on assets (X2) and net profit margin (X3) ranked as the top three, again indicating that profitability is an indispensable part of the financial distress early warning system of Chinese manufacturing enterprises.

    The previous empirical results show that the random forest model has the optimal effect on financial distress prediction. It can identify exactly which companies are in financial distress. The accuracy of predicting financial distress by using a random forest with the data for 1-3-year of am early warning period reached 96.82%, and the AUC was 0.9390. In order to better test the forecasting performance of the random forest model on financial distress, we used a sample of listed manufacturing firms in China in 2021 for out-of-sample forecasting. There were 124 listed manufacturing companies in financial distress and 1, 537 healthy companies in 2021. The model trained in the previous section was used for out-of-sample prediction, and the prediction results are shown in Tables 7 and 8. In the out-of-sample prediction of the financial distress of China's listed manufacturing enterprises in 2021, the random forest model using the data for 1-3-year of an early warning period still had the best performance among all prediction frameworks, with an accuracy of 98.75% and an AUC of 0.9901. This shows that the random forest model has excellent out-of-sample forecasting performance and is suitable for identifying the financial distress of China's listed manufacturing firms. Therefore, it is necessary to use nonlinear tools such as a random forest model to predict enterprise financial distress to improve the early warning mechanism of enterprise financial distress.

    Table 7.  ACC of out-of-sample forecasting.
    Year Logistic Tree NN RF SVM XGBoost CatBoost Mean
    Y 0.9401 0.9266 0.9401 0.9774 0.9597 0.9645 0.9368 0.9493
    Y−1 0.9384 0.9253 0.9409 0.9797 0.9562 0.9681 0.9374 0.9494
    Y−2 0.9416 0.9374 0.9398 0.9803 0.9626 0.9717 0.9567 0.9557
    Y−3 0.9314 0.9253 0.9302 0.9694 0.9562 0.9681 0.9332 0.9448
    Y−1, Y−2 0.9442 0.9386 0.9467 0.9827 0.9562 0.9717 0.9554 0.9565
    Y−1, Y−2, Y−3 0.9525 0.9488 0.9375 0.9875* 0.9375 0.9825 0.9663 0.9589
    Mean 0.9414 0.9337 0.9392 0.9795 0.9547 0.9711 0.9476
    Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.

     | Show Table
    DownLoad: CSV
    Table 8.  AUC of out-of-sample forecasting.
    Year Logistic Tree NN RF SVM XGBoost CatBoost Mean
    Y 0.7810 0.8051 0.7324 0.9658 0.7629 0.9370 0.9131 0.8425
    Y−1 0.7876 0.8252 0.8082 0.9674 0.7856 0.9475 0.9297 0.8645
    Y−2 0.8530 0.8385 0.8080 0.9772 0.8184 0.9574 0.9404 0.8847
    Y−3 0.8044 0.8118 0.7779 0.9643 0.7372 0.9578 0.9001 0.8505
    Y−1, Y−2 0.8584 0.8779 0.8710 0.9730 0.8377 0.9515 0.9504 0.9028
    Y−1, Y−2, Y−3 0.8572 0.8788 0.8600 0.9901* 0.8418 0.9874 0.9662 0.9116
    Mean 0.8236 0.8396 0.8096 0.9730 0.7973 0.9564 0.9333
    Note: Y represents the current period; Y−1 means that the number of early warning periods is 1 year; Y−2 means that the number of early warning periods is 2 years; Y−3 means that the number of early warning periods is 3 years; '*' indicates the optimal value.

     | Show Table
    DownLoad: CSV

    We used the financial ratios of 1, 668 Chinese A-share listed manufacturing enterprises from 2016 to 2021 for our empirical analysis. An artificial intelligence model was used to obtain the financial distress prediction values for listed manufacturing enterprises. Our results show that the random forest model has high accuracy in terms of the empirical prediction of the financial distress of Chinese manufacturing enterprises, which reflects the effectiveness of the artificial intelligence model in terms of the prediction of the financial distress of the listed manufacturing enterprises. Profitability has the highest degree of importance for the prediction of financial distress in manufacturing firms, especially the return on equity.

    Based on the results in this paper and the current developmental trend of financial risks in the manufacturing industry, our research can provide certain insight into intelligent supervision.

    First, regulators need to take profitability indicators lagging 1-3-year as early warning indicators of corporate distress to improve the long-term mechanism of financial risk prevention and control. The results in this paper show that a profitability indicator lagging 1-3-year can be used as an important indicator to predict the financial distress of enterprises. Therefore, in the process of improving the early warning mechanism of enterprise financial risk, we should consider the comprehensive measurement of enterprise systemic risk and further optimize the early warning mechanism of enterprise financial risk.

    Second, it is necessary to appropriately introduce cutting-edge machine learning monitoring means to improve the prudential supervision mechanism at the technical level. In recent years, with the continuous development of regulatory technology, frontier technologies such as big data and machine learning have also been actively promoted and applied in the field of financial supervision in various countries. Combining machine learning methods to improve financial risk monitoring, early warning, prevention and control systems will become one of the ideas to improve the prudential supervision mechanism in the future. The empirical results in this paper show that the forecasting performance of the random forest model is always better than that of other forecasting models, showing more robust and accurate forecasting ability. Therefore, machine learning methods can be combined to strengthen the ability to predict the financial crisis of listed enterprises.

    It is worth noting that, although this paper shows that the random forest model can effectively identify the financial distress of the vast majority of enterprises, the identification efficiency of the above model will be reduced in the face of financial distress caused by financial fraud, net asset value shrinkage, sudden operation damage and so on. This also means that regulatory authorities should also combine qualitative regulatory means, such as on-site inspection, to urge listed companies to optimize corporate governance and curb the growth of corporate financial risks. There are several future directions for this research. First, considering the constantly changing financial environment, more valuable text features or other features from social media can be applied in the model to further improve the recognition performance. Second, this study treated financial distress prediction as a binary classification. However, financial distress has different degrees, and future research needs to explore prediction models with three or more categories.

    This research was supported by the National Natural Science Foundation of China (No. 12101622).

    The authors declare no conflict of interest.

    A number of abbreviations have been adopted above. For ease of understanding, we include the relevant abbreviations and their explanations in Table A1.

    Table A1.  Abbreviations and explanations.
    Abbreviation Explanation Abbreviation Explanation
    AI Artificial intelligence NN Neural network
    ACC Accuracy RF Random forest
    AUC Area under receiver operating characteristic curve ROC Receiver operating characteristic
    CatBoost Categorical boosting Skew. Skewness
    Kurt. Kurtosis ST Special treatment
    Logistic Logistic regression Std. Dev. Standard deviation
    Max. Maximum value SVM Support vector machines

     | Show Table
    DownLoad: CSV


    [1] E. I. Altman, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance, 23 (1968), 589–609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x doi: 10.1111/j.1540-6261.1968.tb00843.x
    [2] S. B. Jabeur, C. Gharib, S. Mefteh-Wali, W. B. Arfi, CatBoost model and artificial intelligence techniques for corporate failure prediction, Technol. Forecast. Soc., 166 (2021), 120658. https://doi.org/10.1016/j.techfore.2021.120658 doi: 10.1016/j.techfore.2021.120658
    [3] K. Peng, G. Yan, A survey on deep learning for financial risk prediction, Quant. Financ. Econ., 5 (2021), 716–737. https://doi.org/10.3934/QFE.2021032 doi: 10.3934/QFE.2021032
    [4] T. M. Awan, M. S. Khan, I. U. Haq, S. Kazmi, Oil and stock markets volatility during pandemic times: a review of G7 countries, Green Finance, 3 (2021), 15–27. https://doi.org/10.3934/GF.2021002 doi: 10.3934/GF.2021002
    [5] E. I. Altman, Predicting financial distress of companies: revisiting the Z-score and ZETA® models, In: Handbook of research methods and applications in empirical finance, Edward Elgar Publishing, 2013,428–456. https://doi.org/10.4337/9780857936080.00027
    [6] T. H. Li, X. Li, G. K. Liao, Business cycles and energy intensity. Evidence from emerging economies, Borsa Istanb. Rev., 22 (2022), 560–570. https://doi.org/10.1016/j.bir.2021.07.005 doi: 10.1016/j.bir.2021.07.005
    [7] S. L. Chen, J. H. Zhong, P. Failler, Does China transmit financial cycle spillover effects to the G7 countries?, Economic Research-Ekonomska Istrazivanja, 35 (2022), 5184–5201. https://doi.org/10.1080/1331677x.2021.2025123 doi: 10.1080/1331677x.2021.2025123
    [8] Y. Liu, Z. H. Li, M. R. Xu, The influential factors of financial cycle spillover: evidence from China, Emerg. Mark. Financ. Tr., 56 (2020), 1336–1350. https://doi.org/10.1080/1540496x.2019.1658076 doi: 10.1080/1540496x.2019.1658076
    [9] F. Mai, S. N. Tian, C. Lee, L. Ma, Deep learning models for bankruptcy prediction using textual disclosures, Eur. J. Oper. Res., 274 (2019), 743–758. https://doi.org/10.1016/j.ejor.2018.10.024 doi: 10.1016/j.ejor.2018.10.024
    [10] D. Qiu, D. Li, Comments on the "SSF Report" from the perspective of economic statistics, Green Finance, 3 (2021), 403–463. https://doi.org/ 10.3934/GF.2021020 doi: 10.3934/GF.2021020
    [11] Z. H. Li, J. H. Zhong, Impact of economic policy uncertainty shocks on China's financial conditions, Financ. Res. Lett., 35 (2020), 101303. https://doi.org/10.1016/j.frl.2019.101303 doi: 10.1016/j.frl.2019.101303
    [12] S. P. Zhao, K. Xu, Z. Wang, C. Liang, W. Lu, B. Chen, Financial distress prediction by combining sentiment tone features, Econ. Model., 106 (2022), 105709. https://doi.org/10.1016/j.econmod.2021.105709 doi: 10.1016/j.econmod.2021.105709
    [13] Z. H. Li, H. Dong, C. Floros, A. Charemis, P. Failler, Re-examining Bitcoin volatility: A CAViaR-based approach, Emerg. Mark. Financ. Tr., 58 (2022), 1320–1338. https://doi.org/10.1080/1540496x.2021.1873127 doi: 10.1080/1540496x.2021.1873127
    [14] Z. Huang, H. Dong, S. Jia, Equilibrium pricing for carbon emission in response to the target of carbon emission peaking, Energ. Econ., 112 (2022), 106160. https://doi.org/10.1016/j.eneco.2022.106160 doi: 10.1016/j.eneco.2022.106160
    [15] T. T. Chen, S. J. Lee, A weighted LS-SVM based learning system for time series forecasting, Inform. Sciences, 299 (2015), 99–116. https://doi.org/10.1016/j.ins.2014.12.031 doi: 10.1016/j.ins.2014.12.031
    [16] E. Dumitrescu, S. Hue, C. Hurlin, S. Tokpavi, Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects, Eur. J. Oper. Res., 297 (2022), 1178–1192. https://doi.org/10.1016/j.ejor.2021.06.053 doi: 10.1016/j.ejor.2021.06.053
    [17] J. M. Liu, S. C. Zhang, H. Y. Fan, A two-stage hybrid credit risk prediction model based on XGBoost and graph-based deep neural network, Expert Syst. Appl., 195 (2022), 116624. https://doi.org/10.1016/j.eswa.2022.116624 doi: 10.1016/j.eswa.2022.116624
    [18] S. Bag, S. Gupta, A. Kumar, U. Sivarajah, An integrated artificial intelligence framework for knowledge creation and B2B marketing rational decision making for improving firm performance, Ind. Market. Manag., 92 (2021), 178–189. https://doi.org/10.1016/j.indmarman.2020.12.001 doi: 10.1016/j.indmarman.2020.12.001
    [19] F. Barboza, H. Kimura, E. Altman, Machine learning models and bankruptcy prediction, Expert Syst. Appl., 83 (2017), 405–417. https://doi.org/10.1016/j.eswa.2017.04.006 doi: 10.1016/j.eswa.2017.04.006
    [20] S. Papadopoulos, C. E. Kontokosta, Grading buildings on energy performance using city benchmarking data, Appl. Energ., 233 (2019), 244–253. https://doi.org/10.1016/j.apenergy.2018.10.053 doi: 10.1016/j.apenergy.2018.10.053
    [21] R. Kellner, M. Nagl, D. Rosch, Opening the black box—Quantile neural networks for loss given default prediction, J. Bank. Financ., 134 (2022), 106334. https://doi.org/10.1016/j.jbankfin.2021.106334 doi: 10.1016/j.jbankfin.2021.106334
    [22] D. Qiu, D. Li, Paradox in deviation measure and trap in method improvement—take international comparison as an example, Quant. Financ. Econ., 5 (2021), 591–603. https://doi.org/10.3934/QFE.2021026 doi: 10.3934/QFE.2021026
    [23] T. Q. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, In: KDD'16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco: Assoc Computing Machinery, 2016,785–794. https://doi.org/10.1145/2939672.2939785
    [24] P. Carmona, F. Climent, A. Momparler, Predicting failure in the U.S. banking sector: an extreme gradient boosting approach, Int. Rev. Econ. Financ., 61 (2019), 304–323. https://doi.org/10.1016/j.iref.2018.03.008 doi: 10.1016/j.iref.2018.03.008
    [25] D. Ardila, A. Ahmed, D. Sornette, Comparing ask and transaction prices in the Swiss housing market, Quant. Financ. Econ., 5 (2021), 67–93. https://doi.org/10.3934/QFE.2021004 doi: 10.3934/QFE.2021004
    [26] Z. H. Li, H. Chen, B. Mo, Can digital finance promote urban innovation? Evidence from China, Borsa Istanb. Rev., in press. https://doi.org/10.1016/j.bir.2022.10.006
    [27] L. Breiman, Random forests, Machine Learning, 45 (2001), 5–32. https://doi.org/10.1023/a:1010933404324 doi: 10.1023/a:1010933404324
    [28] R. Katuwal, P. N. Suganthan, L. Zhang, Heterogeneous oblique random forest, Pattern Recogn., 99 (2020), 107078. https://doi.org/10.1016/j.patcog.2019.107078 doi: 10.1016/j.patcog.2019.107078
    [29] E. I. Altman, M. Iwanicz-Drozdowska, E. K. Laitinen, A. Suvas, Financial distress prediction in an international context: a review and empirical analysis of Altman's Z-score model, J. Int. Fin. Manag. Acc., 28 (2017), 131–171. https://doi.org/10.1111/jifm.12053 doi: 10.1111/jifm.12053
    [30] G. K. Liao, P. Hou, X. Y. Shen, K. Albitar, The impact of economic policy uncertainty on stock returns: the role of corporate environmental responsibility engagement, Int. J. Financ. Econ., 26 (2021), 4386–4392. https://doi.org/10.1002/ijfe.2020 doi: 10.1002/ijfe.2020
    [31] R. B. Geng, I. Bose, X. Chen, Prediction of financial distress: an empirical study of listed Chinese companies using data mining, Eur. J. Oper. Res., 241 (2015), 236–247. https://doi.org/10.1016/j.ejor.2014.08.016 doi: 10.1016/j.ejor.2014.08.016
    [32] X. B. Tang, S. X. Li, M. L. Tan, W. X. Shi, Incorporating textual and management factors into financial distress prediction: a comparative study of machine learning methods, J. Forecasting, 39 (2020), 769–787. https://doi.org/10.1002/for.2661 doi: 10.1002/for.2661
    [33] H. Li, C. J. Li, X. J. Wu, J. Sun, Statistics-based wrapper for feature selection: an implementation on financial distress identification with support vector machine, Appl. Soft Comput., 19 (2014), 57–67. https://doi.org/10.1016/j.asoc.2014.01.018 doi: 10.1016/j.asoc.2014.01.018
    [34] S. Liu, X. Shen, T. Jiang, P. Failler, Impacts of the financialization of manufacturing enterprises on total factor productivity: empirical examination from China's listed companies, Green Finance, 3 (2021), 59–89. https://doi.org/10.3934/GF.2021005 doi: 10.3934/GF.2021005
    [35] Y. Xia, L. He, Y. Li, N. Liu, Y. Ding, Predicting loan default in peer-to-peer lending using narrative data, J. Forecasting, 39 (2020), 260–280. https://doi.org/10.1002/for.2625 doi: 10.1002/for.2625
    [36] E. A. Mohamed, I. E. Ahmed, R. Mehdi, H. Hussain, Impact of corporate performance on stock price predictions in the UAE markets: Neuro-fuzzy model, Intell. Syst. Account., 28 (2021), 52–71. https://doi.org/10.1002/isaf.1484 doi: 10.1002/isaf.1484
    [37] M. Zhou, H. Liu, Y. Hu, Research on corporate financial performance prediction based on self-organizing and convolutional neural networks, Expert Syst., 39 (2022), e13042. https://doi.org/10.1111/exsy.13042 doi: 10.1111/exsy.13042
  • This article has been cited by:

    1. Zhenzhen Wang, Feite Zhou, Junhao Zhong, Can China's low-carbon city pilot policy facilitate carbon neutrality? Evidence from a machine learning approach, 2024, 84, 03135926, 756, 10.1016/j.eap.2024.09.028
    2. Early Ridho Kismawadi, Mohammad Irfan, Uun Dwi Al Muddatstsir, Fatima Muhammad Abdulkarim, 2023, chapter 3, 9798369310380, 35, 10.4018/979-8-3693-1038-0.ch003
    3. Marcos Machado, Joerg Osterrieder, Daniel Chen, Forecasting Commercial Customers Credit Risk Through Early Warning Signals Data: A Machine Learning based Approach, 2024, 1556-5068, 10.2139/ssrn.4754568
    4. Guo Dong Hou, Dong Ling Tong, Soung Yue Liew, Peng Yin Choo, 2024, Exploring Random Forest Regression for Financial Distress Detection, 979-8-3315-2855-3, 7, 10.1109/AiDAS63860.2024.10730409
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2509) PDF downloads(219) Cited by(4)

Figures and Tables

Figures(7)  /  Tables(9)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog