Machine learning model of tax arrears prediction based on knowledge graph

Jie Zheng; Yijun Li; Jie Zheng; Yijun Li

doi:10.3934/era.2023206

Electronic Research Archive

2023, Volume 31, Issue 7: 4057-4076. doi: 10.3934/era.2023206

Previous Article Next Article

Research article Special Issues

Machine learning model of tax arrears prediction based on knowledge graph

Jie Zheng ^,,
Yijun Li

School of Management, Harbin Institute of Technology, Harbin 150001, China

Received: 22 February 2023 Revised: 27 April 2023 Accepted: 03 May 2023 Published: 25 May 2023

Most of the existing research on enterprise tax arrears prediction is based on the financial situation of enterprises. The influence of various relationships among enterprises on tax arrears is not considered. This paper integrates multivariate data to construct an enterprise knowledge graph. Then, the correlations between different enterprises and risk events are selected as the prediction variables from the knowledge graph. Finally, a tax arrears prediction machine learning model is constructed and implemented with better prediction power than earlier studies. The results show that the correlations between enterprises and tax arrears events through the same telephone number, the same E-mail address and the same legal person commonly exist. Based on these correlations, potential tax arrears can be effectively predicted by the machine learning model. A new method of tax arrears prediction is established, which provides new ideas and analysis frameworks for tax management practice.
- tax arrears prediction,
- knowledge graph,
- machine learning,
- enterprise association relationship,
- risk contagion
Citation: Jie Zheng, Yijun Li. Machine learning model of tax arrears prediction based on knowledge graph[J]. Electronic Research Archive, 2023, 31(7): 4057-4076. doi: 10.3934/era.2023206

Related Papers:

Abstract

Most of the existing research on enterprise tax arrears prediction is based on the financial situation of enterprises. The influence of various relationships among enterprises on tax arrears is not considered. This paper integrates multivariate data to construct an enterprise knowledge graph. Then, the correlations between different enterprises and risk events are selected as the prediction variables from the knowledge graph. Finally, a tax arrears prediction machine learning model is constructed and implemented with better prediction power than earlier studies. The results show that the correlations between enterprises and tax arrears events through the same telephone number, the same E-mail address and the same legal person commonly exist. Based on these correlations, potential tax arrears can be effectively predicted by the machine learning model. A new method of tax arrears prediction is established, which provides new ideas and analysis frameworks for tax management practice.

References

[1]	H. Krut, X. Peng, Does corporate social performance lead to better financial performance? Evidence from Turkey, Green Finance, 3 (2021), 464–482. https://doi.org/10.3934/gf.2021021 doi: 10.3934/gf.2021021
[2]	D. Marghescu, M. Kallio, B. Back, Using financial ratios to select companies for tax auditing: a preliminary study, in Communications in Computer and Information Science. Springer, Berlin, 2010. https://doi.org/10.1007/978-3-642-16324-1_45
[3]	A. Su, Z. He, J. Su, Y. Zhou, Y. Fan, Y. Kong, Detection of tax arrears based on ensemble learning model, in Proceedings of the 2018 International Conference on Wavelet Analysis and Pattern Recognition, Piscataway, NJ, (2018), 270–274. https://doi.org/10.1109/icwapr.2018.8521362
[4]	A. Ippolito, A. C. G. Lozano, Sammon mapping-based gradient boosted trees for tax crime prediction in the city of São Paulo, in Enterprise Information Systems, ICEIS 2020, (2020), 293–316. https://doi.org/10.1007/978-3-030-75418-1_14
[5]	J. Vanhoeyveld, D. Martens, B. Peeters, Value-added tax fraud detection with scalable anomaly detection techniques, Appl. Soft. Comput., 86 (2020), 1–38. https://doi.org/10.1016/j.asoc.2019.105895 doi: 10.1016/j.asoc.2019.105895
[6]	M. Z. Abedin, G. Chi, M. M. Uddin, M. S. Satu, M. I. Khan, P. Hajek, Tax default prediction using feature transformation-based machine learning, IEEE Access, 9 (2021), 19864–19881. https://doi.org/10.1109/access.2020.3048018 doi: 10.1109/access.2020.3048018
[7]	E. I. Altman, M. Balzano, A. Giannozzi, S. Srhoj, Revisiting SME default predictors: The Omega Score, J. Small Bus. Manage., 2022 (2022), 1–35. https://doi.org/10.1080/00472778.2022.2135718 doi: 10.1080/00472778.2022.2135718
[8]	O. Lukason, A. Andresson, Tax arrears versus financial ratios in bankruptcy prediction, J. Risk Financ. Manag., 12 (2019), 187–200. https://doi.org/10.3390/jrfm12040187 doi: 10.3390/jrfm12040187
[9]	S. Chen, J. Zhong, P. Failler, Does China transmit financial cycle spillover effects to the G7 countries, Econ. Res. -Ekon. Istraz., 35 (2022), 5184-5201. https://doi.org/10.1080/1331677X.2021.2025123 doi: 10.1080/1331677X.2021.2025123
[10]	F. Misra, R. Kurniawan, The role of audit information dissemination in curbing the contagion of tax noncompliance, J. Innov. Bus. Econ., 4 (2020). 1–11. https://doi.org/10.22219/jibe.v4i01.10223 doi: 10.22219/jibe.v4i01.10223
[11]	Z. Li, J. Zhu, J. He, The effects of digital financial inclusion on innovation and entrepreneurship: A network perspective, Electron. Res. Arch., 30 (2022), 4697–4715. https://doi.org/10.3934/era.2022238 doi: 10.3934/era.2022238
[12]	G. Kou, Y. Xu, Y. Peng, F. Shen, Y. Chen, K. Chang, et al., Bankruptcy prediction for SMEs using transactional data and two-stage multiobjective feature selection, Decis. Support Syst., 140 (2021), 113429. https://doi.org/10.1016/j.dss.2020.113429 doi: 10.1016/j.dss.2020.113429
[13]	P. Giudici, B. H. Misheva, A. Spelta, Network based credit risk models, Qual. Eng., 32 (2020), 199–211. https://doi.org/10.1080/08982112.2019.1655159 doi: 10.1080/08982112.2019.1655159
[14]	K. Peng, G. Yan, A survey on deep learning for financial risk prediction, Quant. Finance. Econ., 5 (2021), 716–737. https://doi.org/10.3934/qfe.2021032 doi: 10.3934/qfe.2021032
[15]	Õ. R. Siimon, O. Lukason, A decision support system for corporate tax arrears prediction, Sustainability, 13 (2021), 8363. https://doi.org/10.3390/su13158363 doi: 10.3390/su13158363
[16]	V. Chaudhri, C. Baru, N. Chittar, X. Dong, M. Genesereth, J. Hendler, Knowledge graphs: introduction, history and, perspectives, AI Mag., 43 (2022), 17–29. https://doi.org/10.1609/aimag.v43i1.19119 doi: 10.1609/aimag.v43i1.19119
[17]	R. Angles, C. Gutierrez, Survey of graph database models, ACM Comput. Surv., 40 (2008), 1–39. https://doi.org/10.1145/1322432.1322433 doi: 10.1145/1322432.1322433
[18]	N. Ahbali, X. Liu, A. Nanda, J. Stark, A. Talukder, R. P. Khandpur, Identifying corporate credit risk sentiments from financial news, in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, (2022), 362–370. http://dx.doi.org/10.18653/v1/2022.naacl-industry.40
[19]	Z. Li, L. Chen, H. Dong, What are bitcoin market reactions to its-related events, Int. Rev. Econ. Finance, 73 (2021), 1–10. https://doi.org/10.1016/j.iref.2020.12.020 doi: 10.1016/j.iref.2020.12.020
[20]	T. Ruan, L. Xue, H. Wang, F. Hu, L. Zhao, J. Ding, Building and exploring an enterprise knowledge graph for investment analysis, in International Semantic Web Conference 2016, (2016), 418–436. https://doi.org/10.1007/978-3-319-46547-0_35
[21]	X. Chang, The impact of corporate tax outcomes on forced CEO turnover, Natl. Account. Rev., 4 (2022), 218–236. https://doi.org/10.3934/nar.2022013 doi: 10.3934/nar.2022013
[22]	A. Sousa, A. Braga, J. Cunha, Impact of macroeconomic indicators on bankruptcy prediction models: Case of the Portuguese construction sector, Quant. Finance. Econ., 6 (2022), 405–432. https://doi.org/10.3934/qfe.2022018 doi: 10.3934/qfe.2022018
[23]	Z. Li, Z. Huang, Y. Su, New media environment, environmental regulation and corporate green technology innovation: Evidence from China, Energy Econ., 119 (2023), 106545. https://doi.org/10.1016/j.eneco.2023.106545 doi: 10.1016/j.eneco.2023.106545
[24]	Y. Liu, Z. Li, M. Xu, The influential factors of financial cycle spillover: evidence from China, Emerg. Mark. Finance Trade, 56 (2020), 1336–1350. https://doi.org/10.1080/1540496x.2019.1658076 doi: 10.1080/1540496x.2019.1658076
[25]	G. Aytkhozhina, A. Miller, State tax control strategies: Theoretical aspects, Contaduría y Administración, 63 (2018), 25. https://doi.org/10.22201/fca.24488410e.2018.1672 doi: 10.22201/fca.24488410e.2018.1672
[26]	Z. Li, B. Mo, H. Nie, Time and frequency dynamic connectedness between cryptocurrencies and financial assets in China, Int. Rev. Econ. Finance, 86 (2023), 46–57. https://doi.org/10.1016/j.iref.2023.01.015 doi: 10.1016/j.iref.2023.01.015
[27]	Z. Li, H. Dong, C. Floros, A. Charemis, P. Failler, Re-examining bitcoin volatility: a CAViaR-based approach, Emerg. Mark. Finance Trade, 58 (2022), 1320–1338. https://doi.org/10.1080/1540496x.2021.1873127 doi: 10.1080/1540496x.2021.1873127
[28]	A. Chang, L. Yang, R. Tsaih, S. Lin, Machine learning and artificial neural networks to construct P2P lending credit-scoring model: A case using Lending Club data, Quant. Finance Econ., 6 (2022), 303–325. https://doi.org/10.3934/qfe.2022013 doi: 10.3934/qfe.2022013
[29]	D. Wang, L. Li, D. Zhao, Corporate finance risk prediction based on LightGBM, Inf. Sci., 602 (2022), 259–268. https://doi.org/10.1016/j.ins.2022.04.058 doi: 10.1016/j.ins.2022.04.058
[30]	B. Gao, V. Balyan, Construction of a financial default risk prediction model based on the LightGBM algorithm, J. Intell. Syst., 31 (2022), 767–779. https://doi.org/10.1515/jisys-2022-0036 doi: 10.1515/jisys-2022-0036
[31]	L. Zhang, Q. Song, Multimodel integrated enterprise credit evaluation method based on attention mechanism, Comput. Intell. Neurosci., 2022 (2022), 1–12. https://doi.org/10.1155/2022/8612759 doi: 10.1155/2022/8612759
[32]	J. G. Ponsam, S.V. J. B. Gracia, G. Geetha, S. Karpaselvi, K. Nimala, Credit risk analysis using LightGBM and a comparative study of popular algorithms, in International Conference on Computing and Communications Technologies (ICCCT), 2021. https://doi.org/10.1109/iccct53315.2021.9711896
[33]	D. G. Kirikos, An evaluation of quantitative easing effectiveness based on out-of-sample forecasts, Natl. Account. Rev., 4 (2022), 378–389. https://doi.org/10.3934/nar.2022021 doi: 10.3934/nar.2022021
[34]	F. Corradin, M. Billio, R. Casarin, Forecasting economic indicators with robust factor models, Natl. Account. Rev., 4 (2022), 167–190. https://doi.org/10.3934/nar.2022010 doi: 10.3934/nar.2022010
[35]	P. Harrington, Machine Learning in Action, Manning Publications, (2012), 143–149.
[36]	J. Davis, M. Goadrich, The relationship between Precision-Recall and ROC curves, in William C. ICML '06: Proceedings of the 23rd international conference on Machine learning, (2006), 233–240. https://doi.org/10.1145/1143844.1143874
[37]	T. Fawcett, An introduction to ROC analysis, Pattern Recognit. Lett., 27 (2006), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010 doi: 10.1016/j.patrec.2005.10.010
[38]	W. H. J. David, S. Lemeshow, R. X. Sturdivant, Applied Logistic Regression, 3 edition, John Wiley & Sons, (2013), 177–178. https://doi.org/10.1002/9781118548387
[39]	Z. Li, C. Yang, Z. Huang, How does the fintech sector react to signals from central bank digital currencies, Finance Res. Lett., 50 (2022), 103308. https://doi.org/10.1016/j.frl.2022.103308 doi: 10.1016/j.frl.2022.103308
[40]	D. L. Wilsin, Asymptotic properties of nearest neighbor rules using edited data, IEEE Trans. Syst. Man Cybern., 3 (1972), 408–421. https://doi.org/10.1109/tsmc.1972.4309137 doi: 10.1109/tsmc.1972.4309137
[41]	I. Tomek, Two modifications of CNN, IEEE Trans. Syst. Man Cybern., 6 (1976), 769–772. https://doi.org/10.1109/tsmc.1976.4309452
[42]	N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, Smote: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16 (2002), 321–357. https://doi.org/10.1613/jair.953 doi: 10.1613/jair.953
[43]	H. Han, W. Y. Wang, B. H. Mao, Borderline-smote: a new over-sampling method in imbalanced data sets learning, in International Conference on Intelligent Computing, (2005), 878–887. https://doi.org/10.1007/11538059_91
[44]	B. Y. Li, Y. Liu, X. G. Wang, Gradient harmonized single-stage detector, in The 33rd AAAI Conference on Artificial Intelligence, (2019), 8577–8584. https://doi.org/10.1609/aaai.v33i01.33018577
[45]	T. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, in 2017 IEEE International Conference on Computer Vision (ICCV), 2017. https://doi.org/10.1109/iccv.2017.324
[46]	T. Li, J. Wen, D. Zeng, K. Liu, Has enterprise digital transformation improved the efficiency of enterprise technological innovation? A case study on Chinese listed companies, Math. Biosci. Eng., 19 (2022), 12632–12654. https://doi.org/10.3934/mbe.2022590 doi: 10.3934/mbe.2022590

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)