Prediction of coronary heart disease in gout patients using machine learning models

Lili Jiang; Sirong Chen; Yuanhui Wu; Da Zhou; Lihua Duan; Lili Jiang; Sirong Chen; Yuanhui Wu; Da Zhou; Lihua Duan

doi:10.3934/mbe.2023212

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 3: 4574-4591. doi: 10.3934/mbe.2023212

Previous Article Next Article

Research article Special Issues

Prediction of coronary heart disease in gout patients using machine learning models

1.
Department of Rheumatology and Clinical Immunology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, China
2.
School of Mathematical Sciences, Soochow University, Suzhou, China
3.
School of Mathematical Sciences, Xiamen University, Xiamen, China
† The authors contributed equally to this work.

Received: 14 October 2022 Revised: 27 November 2022 Accepted: 04 December 2022 Published: 27 December 2022

Growing evidence shows that there is an increased risk of cardiovascular diseases among gout patients, especially coronary heart disease (CHD). Screening for CHD in gout patients based on simple clinical factors is still challenging. Here we aim to build a diagnostic model based on machine learning so as to avoid missed diagnoses or over exaggerated examinations as much as possible. Over 300 patient samples collected from Jiangxi Provincial People's Hospital were divided into two groups (gout and gout+CHD). The prediction of CHD in gout patients has thus been modeled as a binary classification problem. A total of eight clinical indicators were selected as features for machine learning classifiers. A combined sampling technique was used to overcome the imbalanced problem in the training dataset. Eight machine learning models were used including logistic regression, decision tree, ensemble learning models (random forest, XGBoost, LightGBM, GBDT), support vector machine (SVM) and neural networks. Our results showed that stepwise logistic regression and SVM achieved more excellent AUC values, while the random forest and XGBoost models achieved more excellent performances in terms of recall and accuracy. Furthermore, several high-risk factors were found to be effective indices in predicting CHD in gout patients, which provide insights into the clinical diagnosis.
- gout,
- CHD,
- machine learning,
- diagnostic model,
- imbalance data,
- risk factor selection
Citation: Lili Jiang, Sirong Chen, Yuanhui Wu, Da Zhou, Lihua Duan. Prediction of coronary heart disease in gout patients using machine learning models[J]. Mathematical Biosciences and Engineering, 2023, 20(3): 4574-4591. doi: 10.3934/mbe.2023212

Related Papers:

Abstract

Growing evidence shows that there is an increased risk of cardiovascular diseases among gout patients, especially coronary heart disease (CHD). Screening for CHD in gout patients based on simple clinical factors is still challenging. Here we aim to build a diagnostic model based on machine learning so as to avoid missed diagnoses or over exaggerated examinations as much as possible. Over 300 patient samples collected from Jiangxi Provincial People's Hospital were divided into two groups (gout and gout+CHD). The prediction of CHD in gout patients has thus been modeled as a binary classification problem. A total of eight clinical indicators were selected as features for machine learning classifiers. A combined sampling technique was used to overcome the imbalanced problem in the training dataset. Eight machine learning models were used including logistic regression, decision tree, ensemble learning models (random forest, XGBoost, LightGBM, GBDT), support vector machine (SVM) and neural networks. Our results showed that stepwise logistic regression and SVM achieved more excellent AUC values, while the random forest and XGBoost models achieved more excellent performances in terms of recall and accuracy. Furthermore, several high-risk factors were found to be effective indices in predicting CHD in gout patients, which provide insights into the clinical diagnosis.

References

[1]	J. D. Fitzgerald, N. Dalbeth, T. Mikuls, R. Brignardello-Petersen, G. Guyatt, A. M. Abeles, et al., 2020 American College of Rheumatology guideline for the management of gout, Arthritis Care Res., 72 (2020), 744–760. https://doi.org/10.1002/acr.24180 doi: 10.1002/acr.24180
[2]	R. Liu, C. Han, D. Wu, X. Xia, J. Gu, H. Guan, et al., Prevalence of hyperuricemia and gout in mainland China from 2000 to 2014: A systematic review and meta-analysis, Biomed Res. Int., 2015 (2015), 762820. https://doi.org/10.1155/2015/762820 doi: 10.1155/2015/762820
[3]	Y. Zhu, B. J. Pandya, H. K. Choi, Comorbidities of gout and hyperuricemia in the US general population: NHANES 2007–2008, Am. J. Med., 125 (2012), 679–687. https://doi.org/10.1016/j.amjmed.2011.09.033 doi: 10.1016/j.amjmed.2011.09.033
[4]	M. A. De Vera, M. M. Rahman, V. Bhole, J. A. Kopec, H. K. Choi, Independent impact of gout on the risk of acute myocardial infarction among elderly women: a population-based study, Ann. Rheum. Dis., 69 (2010), 1162–1164. https://doi.org/10.1136/ard.2009.122770 doi: 10.1136/ard.2009.122770
[5]	O. O. Seminog, M. J. Goldacre, Gout as a risk factor for myocardial infarction and stroke in England: evidence from record linkage studies, Rheumatology, 52 (2013), 2251–2259. https://doi.org/10.1093/rheumatology/ket293 doi: 10.1093/rheumatology/ket293
[6]	W. B. White, K. G. Saag, M. A. Becker, J. S. Borer, P. B. Gorelick, A. Whelton, et al., Cardiovascular safety of febuxostat or allopurinol in patients with gout, N. Engl. J. Med., 378 (2018), 1200–1210. https://doi.org/10.1056/NEJMoa1710895 doi: 10.1056/NEJMoa1710895
[7]	J. Wang, Prediction of postoperative recovery in patients with acoustic neuroma using machine learning and SMOTE-ENN techniques, Math. Biosci. Eng., 19 (2022), 10407–10423. https://doi.org/10.3934/mbe.2022487 doi: 10.3934/mbe.2022487
[8]	Z. Chen, M. Yang, Y. Wen, S. Jiang, W. Liu, H. Huang, Prediction of atherosclerosis using machine learning based on operations research, Math. Biosci. Eng., 19 (2022), 4892–4910. https://doi.org/10.3934/mbe.2022229 doi: 10.3934/mbe.2022229
[9]	C. Zheng, N. Rashid, Y. L. Wu, R. Koblick, A. T. Lin, G. D. Levy, et al., Using natural language processing and machine learning to identify gout flares from electronic clinical notes, Arthritis Care Res., 66 (2014), 1740–1748. https://doi.org/10.1002/acr.22324 doi: 10.1002/acr.22324
[10]	G. Bahra, L. Wiese, Parameterizing neural networks for disease classification, Expert Syst., 37 (2019), e12465. https://doi.org/10.1111/exsy.12465 doi: 10.1111/exsy.12465
[11]	J. J. Beunza, E. Puertas, E. García-Ovejero, G. Villalba, E. Condes, G. Koleva, et al., Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease), J. Biomed. Inform., 97 (2019), 103257. https://doi.org/10.1016/j.jbi.2019.103257 doi: 10.1016/j.jbi.2019.103257
[12]	K. H. Miao, J. H. Miao, G. J. Miao, Diagnosing coronary heart disease using ensemble machine learning, Int. J. Adv. Comput. Sci. Appl., 7 (2016). https://doi.org/10.14569/ijacsa.2016.071004 doi: 10.14569/ijacsa.2016.071004
[13]	A. H. Gonsalves, F. Thabtah, R. M. A. Mohammad, G. Singh, Prediction of coronary heart disease using machine learning: an experimental analysis, in Proceedings of the 2019 3rd International Conference on Deep Learning Technologies, (2019), 51–56. https://doi.org/10.1145/3342999.3343015
[14]	T. Neogi, T. L. Jansen, N. Dalbeth, J. Fransen, H. R. Schumacher, D. Berendsen, et al., 2015 gout classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative, Arthritis Rheumatol., 67 (2015), 2557–2568. https://doi.org/10.1002/art.39254 doi: 10.1002/art.39254
[15]	F. I. Mowbray, S. M. Fox-Wasylyshyn, M. M. El-Masri, Univariate outliers: a conceptual overview for the nurse researcher, Can. J. Nurs. Res., 51 (2019), 31–37. https://doi.org/10.1177/0844562118786647 doi: 10.1177/0844562118786647
[16]	H. He, E. A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng., 21 (2009), 1263–1284. https://doi.org/10.1109/TKDE.2008.239 doi: 10.1109/TKDE.2008.239
[17]	A. Fernandez, S. Garcia, F. Herrera, N. V. Chawla, SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary, J. Artif. Int. Res., 61 (2018), 863–905. https://doi.org/10.1613/jair.1.11192 doi: 10.1613/jair.1.11192
[18]	T. Jiang, J. L. Gradus, A. J. Rosellini, Supervised machine learning: a brief primer, Behav. Ther., 51 (2020), 675–687. https://doi.org/10.1016/j.beth.2020.05.002 doi: 10.1016/j.beth.2020.05.002
[19]	R. R. Hocking, A Biometrics invited paper. The analysis and selection of variables in linear regression, Biometrics, 32 (1976), 1–49. https://doi.org/10.2307/2529336 doi: 10.2307/2529336
[20]	L. Breiman, Classification and Regression Trees, 1$^{st}$ edition, Routledge, New York, 1984. https://doi.org/10.1201/9781315139470
[21]	L. Breiman, Random forests, Mach. Learn., 45 (2001), 5–32. https://doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
[22]	H. Hong, G. Xiaoling, Y. Hua, Variable selection using mean decrease accuracy and mean decrease gini based on random forest, in 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), (2016), 219–224. https://doi.org/10.1109/ICSESS.2016.7883053
[23]	P. Liu, B. Fu, S. X. Yang, L. Deng, X. Zhong, H. Zheng, Optimizing survival analysis of XGBoost for ties to predict disease progression of breast cancer, IEEE Trans. Biomed. Eng., 68 (2020), 148–160. https://doi.org/10.1109/TBME.2020.2993278 doi: 10.1109/TBME.2020.2993278
[24]	T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2016), 785–794. https://doi.org/10.1145/2939672.2939785
[25]	J. H. Friedman, Greedy function approximation: a gradient boosting machine, Ann. Stat., 29 (2001), 1189–1232. https://doi.org/10.1214/aos/1013203451 doi: 10.1214/aos/1013203451
[26]	G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, et al. Lightgbm: A highly efficient gradient boosting decision tree, in Proceedings of the 31st International Conference on Neural Information Processing Systems, (2017), 3149–3157.
[27]	S. Agatonovic-Kustrin, R. Beresford, Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research, J. Pharm. Biomed. Anal., 22 (2000), 717–727. https://doi.org/10.1016/s0731-7085(99)00272-1 doi: 10.1016/s0731-7085(99)00272-1
[28]	M. Riedmiller, Advanced supervised learning in multi-layer perceptrons-From backpropagation to adaptive learning algorithms, Comput. Stand. Interfaces, 16 (1994), 265–278. https://doi.org/10.1016/0920-5489(94)90017-5 doi: 10.1016/0920-5489(94)90017-5
[29]	C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn., 20 (1995), 273–297. https://doi.org/10.1007/BF00994018 doi: 10.1007/BF00994018
[30]	B. E. Boser, I. M. Guyon, V. N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the fifth annual workshop on Computational learning theory, (1992), 144–152. https://doi.org/10.1145/130385.130401
[31]	T. N. K. Hung, N. Q. K. Le, N. H. Le, L. Van Tuan, T. P. Nguyen, C. Thi, et al., An AI-based prediction model for drug-drug interactions in osteoporosis and Paget's diseases from SMILES, Mol. Inform., 41 (2022), e2100264. https://doi.org/10.1002/minf.202100264 doi: 10.1002/minf.202100264
[32]	L. H. T. Lam, N. H. Le, L. Van Tuan, H. T. Ban, T. N. K. Hung, N. T. K. Nguyen, et al., Machine learning model for identifying antioxidant proteins using features calculated from primary sequences, Biology, 9 (2020), 325. https://doi.org/10.3390/biology9100325 doi: 10.3390/biology9100325
[33]	N. Le, Y. Ou, Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins, BMC Bioinformatics, 17 (2016), 501. https://doi.org/10.1186/s12859-016-1369-y doi: 10.1186/s12859-016-1369-y
[34]	A. E. Hendricks, S. M. Adlof, C. N. Alonzo, A. B. Fox, T. P. Hogan, Identifying children at risk for developmental language disorder using a brief, whole-classroom screen, J. Speech Lang. Hear. Res., 62 (2019), 896–908. https://doi.org/10.1044/2018_jslhr-l-18-0093 doi: 10.1044/2018_jslhr-l-18-0093
[35]	K. H. Huang, C. J. Tai, Y. F. Tsai, Y. H. Kuan, C. Y. Lee, Correlation between gout and coronary heart disease in Taiwan: a nationwide population-based cohort study, Acta Cardiol. Sin., 35 (2019), 634–640. https://doi.org/10.6515/ACS.201911_35(6).20190403B doi: 10.6515/ACS.201911_35(6).20190403B
[36]	M. B. Mittelmark, B. M. Psaty, P. M. Rautaharju, L. P. Fried, N. O. Borhani, R. P. Tracy, et al., Prevalence of cardiovascular diseases among older adults: the cardiovascular health study, Am. J. Epidemiol., 137 (1993), 311–317. https://doi.org/10.1093/oxfordjournals.aje.a116678 doi: 10.1093/oxfordjournals.aje.a116678
[37]	B. B. Agbor-Etang, J. F. Setaro, Management of hypertension in patients with ischemic heart disease, Curr. Cardiol. Rep., 17 (2015), 119. https://doi.org/10.1007/s11886-015-0662-0 doi: 10.1007/s11886-015-0662-0
[38]	D. Hu, J. Li, X. Li, Investigation of blood lipid levels and statin interventions in outpatients with coronary heart disease in China: the China Cholesterol Education Program (CCEP), Circ. J., 72 (2008), 2040–2045. https://doi.org/10.1253/circj.cj-08-0417 doi: 10.1253/circj.cj-08-0417
[39]	L. E. Eberly, J. D. Cohen, R. Prineas, L. Yang, Impact of incident diabetes and incident nonfatal cardiovascular disease on 18-year mortality: the multiple risk factor intervention trial experience, Diabetes Care, 26 (2003), 848–854. https://doi.org/10.2337/diacare.26.3.848 doi: 10.2337/diacare.26.3.848
[40]	U. Mons, A. Müezzinler, C. Gellert, B. Schöttker, C. C. Abnet, M. Bobak, et al., Impact of smoking and smoking cessation on cardiovascular events and mortality among older adults: meta-analysis of individual participant data from prospective cohort studies of the CHANCES consortium, BMJ, 350 (2015), h1551. https://doi.org/10.1136/bmj.h1551 doi: 10.1136/bmj.h1551
[41]	C. M. Hales, M. D. Carroll, C. D. Fryar, C. L. Ogden, Prevalence of obesity among adults and youth: United States, 2015-2016, NCHS Data Brief, 288 (2017).
[42]	I. Atukorala, D. J. Hunter, Valdecoxib: the rise and fall of a COX-2 inhibitor, Expert Opin. Pharmacother., 14 (2013), 1077–1086. https://doi.org/10.1517/14656566.2013.783568 doi: 10.1517/14656566.2013.783568
[43]	M. J. Sarnak, A. S. Levey, A. C. Schoolwerth, J. Coresh, B. Culleton, L. L. Hamm, et al., Kidney disease as a risk factor for development of cardiovascular disease: a statement from the American Heart Association Councils on Kidney in Cardiovascular Disease, High Blood Pressure Research, Clinical Cardiology, and Epidemiology and Prevention, Hypertension, 42 (2003), 1050–1065. https://doi.org/10.1161/01.HYP.0000102971.85504.7c doi: 10.1161/01.HYP.0000102971.85504.7c
[44]	E. L. Schiffrin, M. L. Lipman, J. F. Mann, Chronic kidney disease: effects on the cardiovascular system, Circulation, 116 (2007), 85–97. https://doi.org/10.1161/CIRCULATIONAHA.106.678342 doi: 10.1161/CIRCULATIONAHA.106.678342
[45]	A. S. Levey, J. P. Bosch, J. B. Lewis, T. Greene, N. Rogers, D. Roth, A more accurate method to estimate glomerular filtration rate from serum creatinine: a new prediction equation, Ann. Intern. Med., 130 (1999), 461–470. https://doi.org/10.7326/0003-4819-130-6-199903160-00002 doi: 10.7326/0003-4819-130-6-199903160-00002
[46]	A. S. Levey, L. A. Stevens, C. H. Schmid, Y. Zhang, A. F. Castro Iii, H. I. Feldman, et al., A new equation to estimate glomerular filtration rate, Ann. Intern. Med., 150 (2009), 604–612. https://doi.org/10.7326/0003-4819-150-9-200905050-00006 doi: 10.7326/0003-4819-150-9-200905050-00006

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)