Research article Special Issues

A robust framework for enhancing cardiovascular disease risk prediction using an optimized category boosting model

  • Received: 13 October 2023 Revised: 21 January 2024 Accepted: 23 January 2024 Published: 29 January 2024
  • Cardiovascular disease (CVD) is a leading cause of mortality worldwide, and it is of utmost importance to accurately assess the risk of cardiovascular disease for prevention and intervention purposes. In recent years, machine learning has shown significant advancements in the field of cardiovascular disease risk prediction. In this context, we propose a novel framework known as CVD-OCSCatBoost, designed for the precise prediction of cardiovascular disease risk and the assessment of various risk factors. The framework utilizes Lasso regression for feature selection and incorporates an optimized category-boosting tree (CatBoost) model. Furthermore, we propose the opposition-based learning cuckoo search (OCS) algorithm. By integrating OCS with the CatBoost model, our objective is to develop OCSCatBoost, an enhanced classifier offering improved accuracy and efficiency in predicting CVD. Extensive comparisons with popular algorithms like the particle swarm optimization (PSO) algorithm, the seagull optimization algorithm (SOA), the cuckoo search algorithm (CS), K-nearest-neighbor classification, decision tree, logistic regression, grid-search support vector machine (SVM), grid-search XGBoost, default CatBoost, and grid-search CatBoost validate the efficacy of the OCSCatBoost algorithm. The experimental results demonstrate that the OCSCatBoost model achieves superior performance compared to other models, with overall accuracy, recall, and AUC values of 73.67%, 72.17%, and 0.8024, respectively. These outcomes highlight the potential of CVD-OCSCatBoost for improving cardiovascular disease risk prediction.

    Citation: Zhaobin Qiu, Ying Qiao, Wanyuan Shi, Xiaoqian Liu. A robust framework for enhancing cardiovascular disease risk prediction using an optimized category boosting model[J]. Mathematical Biosciences and Engineering, 2024, 21(2): 2943-2969. doi: 10.3934/mbe.2024131

    Related Papers:

  • Cardiovascular disease (CVD) is a leading cause of mortality worldwide, and it is of utmost importance to accurately assess the risk of cardiovascular disease for prevention and intervention purposes. In recent years, machine learning has shown significant advancements in the field of cardiovascular disease risk prediction. In this context, we propose a novel framework known as CVD-OCSCatBoost, designed for the precise prediction of cardiovascular disease risk and the assessment of various risk factors. The framework utilizes Lasso regression for feature selection and incorporates an optimized category-boosting tree (CatBoost) model. Furthermore, we propose the opposition-based learning cuckoo search (OCS) algorithm. By integrating OCS with the CatBoost model, our objective is to develop OCSCatBoost, an enhanced classifier offering improved accuracy and efficiency in predicting CVD. Extensive comparisons with popular algorithms like the particle swarm optimization (PSO) algorithm, the seagull optimization algorithm (SOA), the cuckoo search algorithm (CS), K-nearest-neighbor classification, decision tree, logistic regression, grid-search support vector machine (SVM), grid-search XGBoost, default CatBoost, and grid-search CatBoost validate the efficacy of the OCSCatBoost algorithm. The experimental results demonstrate that the OCSCatBoost model achieves superior performance compared to other models, with overall accuracy, recall, and AUC values of 73.67%, 72.17%, and 0.8024, respectively. These outcomes highlight the potential of CVD-OCSCatBoost for improving cardiovascular disease risk prediction.


    [1] W. B. Kannel, D. Mcgee, T. Gordon, A general cardiovascular risk profile: The Frmingham study, Am. J. Cardiol., 38 (1976), 46–51. doi: 10.1016/0002-9149(76)90061-8
    [2] R. M. Conroy, K. Pyoral, A. P. Fitzgerald, S. Sans, A. Menotti, G. De Backer, et al., Estimation of ten-year risk of fatal cardiovascular disease in Europe: The SCORE project, Eur. Heart J., 24 (2003), 987–1003. doi: 10.1016/S0195-668X(03)00114-3
    [3] C. Hippisley, Derivation and validation of QRISK, a new cardiovascular diseaserisk score for the United Kingdom: Prospective open cohort study, BMJ, 335 (2007), 136. doi: 10.1136/bmj.39261.471806.55
    [4] S. F. Weng, J. Reps, J. Kai, Can machine-learning improve cardiovascular risk prediction using routine clinical data, PLoS ONE, 12 (2017), e0174944. doi: 10.1371/journal.pone.0174944
    [5] A. C. Dimopoulos, M. Nikolaidou, F. F. Caballero, Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk, BMC Med. Res. Methodol., 18 (2018). doi: 10.1186/s12874-018-0644-1
    [6] W. Huang, T. W. Ying, W. L. C. Chin, Application of ensemble machine learning algorithms on lifestyle factors and wearables for cardiovascular risk prediction, Sci. Rep., 12 (2022), 1033. doi: 10.1038/s41598-021-04649-y
    [7] M. Ordikhani, M. S. Abadeh, C. Prugger, An evolutionary machine learning algorithm for cardiovascular disease risk prediction, PLoS ONE, 17 (2022), e0271723. doi: 10.1371/journal.pone.0271723
    [8] M. Pal, S. Parija, G. Panda, K. Dhama, R. K. Mohapatra, Risk prediction of cardiovascular disease using machine learning classifiers, Open Med., 17 (2022), 1100–1113. doi: 10.1515/med-2022-0508
    [9] L. R. Guarneros-Nolasco, N. A. Cruz-Ramos, G. Alor-Hernández, L. Rodríguez-Mazahua, J. L. Sánchez-Cervantes, Identifying the main risk factors for cardiovascular diseases prediction using machine learning algorithms, Mathematics, 9 (2021), 2537. doi: 10.3390/math9202537
    [10] M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, M. A. Moni, Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison, Comput. Biol. Med., 136(2021), 104672. doi: 10.1016/j.compbiomed.2021.104672
    [11] K. Kanagarathinam, D. Sankaran, R. Manikandan, Machine learning-based risk prediction model for cardiovascular disease using a hybrid dataset, Data Knowl. Eng., 140 (2022), 102042. doi: 10.1016/j.datak.2022.102042
    [12] J. M. Sung, I. J. Cho, D. Sung, S. Kim, Development and verification of prediction models for preventing cardiovascular diseases, PLoS ONE, 14 (2019), e0222809. doi: 10.1371/journal.pone.0222809
    [13] Y. Pan, M. Fu, B. Cheng, X. Tao, J. Guo, Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform, IEEE Access, 8 (2020), 189503–189512. doi: 10.1109/ACCESS.2020.3026214
    [14] S. K. Pandey, R. R. Janghel, Automatic detection of arrhythmia from imbalanced ECG database using CNN model with SMOTE, Australas. Phys. Eng. Sci. Med., 42 (2019), 1129–1139. doi: 10.1007/s13246-019-00815-9
    [15] L. Ali, A. Rahman, A. Khan, M. Zhou, A. Javeed, J. A. Khan, An automated diagnostic system for heart disease prediction based on χ2 statistical model and optimally configured deep neural network, IEEE Access, 7 (2019), 34938–34945. doi: 10.1109/ACCESS.2019.2904800
    [16] I. D. Mienye, Y. Sun, Z. Wang, An improved ensemble learning approach for the prediction of heart disease risk, Inf. Med. Unlocked, 20 (2020), 100402. doi: 10.1016/j.imu.2020.100402
    [17] S. Pandya, T. R. Gadekallu, P. K. Reddy, W. Wang, M. Alazab, InfusedHeart: A novel knowledge-infused learning framework for diagnosis of cardiovascular events, IEEE Trans. Comput. Soc. Syst., 2022 (2022). doi: 10.1109/TCSS.2022.3151643
    [18] P. Srinivas, R. Katarya, HyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost, Biomed. Signal Process. Control, 73 (2022), 103456. doi: 10.1016/j.bspc.2021.103456
    [19] V. Baviskar, M. Verma, P. Chatterjee, G. Singal, T. R. Gadekallu, Optimization using internet of agent based stacked sparse autoencoder model for heart disease prediction, Exp. Syst., 2023 (2023), e13359. doi: 10.1111/exsy.13359
    [20] X. Wei, C. Rao, X. Xiao, L. Chen, M. Goh, Risk assessment of cardiovascular disease based on SOLSSA-CatBoost model, Exp. Syst. Appl., 219 (2023), 119648. doi: 10.1016/j.eswa.2023.119648
    [21] A. S. Kumar, R. Rekha, An improved hawks optimizer based learning algorithms for cardiovascular disease prediction, Biomed. Signal Process. Control, 81 (2023), 104442. doi: 10.1016/j.bspc.2022.104442
    [22] X. S. Yang, Cuckoo search via Lxevy flights, in 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC), (2009), 210–214.
    [23] H. R. Tizhoosh, Opposition-based learning: a new scheme for machine intelligence, in Proceedings of IEEE International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce(CIMCA-IAWTIC06, (2005), 695–701.
    [24] A. A. Ewees, A. E. Mohamed, E. H. Houssein, Improved grasshopper optimization algorithm using opposition-based learning, Exp. Syst. Appl., 112 (2018), 156–172. doi: 10.1016/j.eswa.2018.06.023
    [25] X. Yu, W. Xu, C. Li, Opposition-based learning grey wolf optimizer for global optimization, Knowl.-Based Syst., 226 (2021), 107139. doi: 10.1016/j.knosys.2021.107139
    [26] M. Khishe, Greedy opposition-based learning for chimp optimization algorithm, Artif. Intell. Rev., 56 (2022), 7633–7663. doi: 10.1007/s10462-022-10343-w
    [27] M. Imran, S. Khan, H. Hlavacs, Intrusion detection in networks using cuckoo search optimization, Soft Comput., 26 (2022), 10651–10663. doi: 10.1007/s00500-022-06798-2
    [28] B. Jia, B. Yu, Q. Wu, Adaptive affinity propagation method based on improved cuckoo search, Knowl.-Based Syst., 111 (2016), 27–35. doi: 10.1016/j.knosys.2016.07.039
    [29] S. Chakraborty, K. Mali, Fuzzy and elitist cuckoo search based microscopic image segmentation approach, Appl. Soft Comput., 130 (2022), 109671. doi: 10.1016/j.asoc.2022.109671
    [30] P. N. Maddaiah, P. P. Narayanan, An improved Cuckoo search algorithm for optimization of artificial neural network training, Neural Process. Lett., 2023 (2023), 1–28. doi: 10.1007/s11063-023-11411-0
    [31] R. Eberhart, K. James, A new optimizer using particle swarm theory, in Proceedings of the Sixth International Symposium on Micro Machine and Human Science, (1995), 39–43.
    [32] G. Dhiman, V. Kumar, Seagull optimization algorithm: Theory and its applications for largescale industrial engineering problems, Knowl.-Based Syst., 165 (2019), 169–196. doi: 10.1016/j.knosys.2018.11.024
    [33] J. Maiga, G. G. Hungilo, Comparison of machine learning models in prediction of cardiovascular disease using health record data, in 2019 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), (2019), 45–48.
    [34] A. Nikam, S. Bhandari, A. Mhaske, S. Mantri, Cardiovascular disease prediction using machine learning models, in 2020 IEEE Pune Section International Conference (PuneCon), (2020), 22–27.
    [35] J. C. T. Arroyo, A. J. P. Delima, An optimized neural network using genetic algorithm for cardiovascular disease prediction, J. Adv. Inf. Technol., 13 (2022), 95–99. doi: 10.12720/jait.13.1.95-99
    [36] M. Peng, F. Hou, Z. Cheng, T. Shen, K. Liu, C. Zhao, et al., A cardiovascular disease risk score model based on high contribution characteristics, Appl. Sci., 13 (2023), 893. doi: 10.3390/app13020893
    [37] T. B. Olesen, M. Pareek, The influence of age and sex on the prognostic importance of traditional cardiovascular risk factors, selected circulating biomarkers and other markers of subclinical cardiovascular damage, Curr. Opin. Cardiol., 38 (2023), 21–31. doi: 10.1097/hco.0000000000001005
    [38] E. Harold, P. R. Bays, E. E. Taub, Ten things to know about ten cardiovascular disease risk factors, Am. J. Prev. Cardiol., 5 (2021), 100149. doi: 10.1016/j.ajpc.2021.100149
    [39] C. Phanish, B. Radhika, Assessing the risk factors associated with cardiovascular disease, Eur. J. Prev. Cardiol., 25 (2018), 932–933. doi: 10.1177/2047487318778652
    [40] A. Arafa, H. H. Lee, E. S. Eshak, K. Shirai, K. Liu, J. Li, et al., Modifiable risk factors for cardiovascular disease in Korea and Japan, Korean Circ. J., 51 (2021), 643–655. doi: 10.4070/kcj.2021.0121
    [41] M. George, K. George, T. Athanasios, Cardiovascular disease in Greece; the latest evidence on risk factors, Hell. J. Cardiol., 60 (2019), 271–275. doi: 10.1016/j.hjc.2018.09.006
    [42] P. Zhao, H. Li, Opposition-based Cuckoo search algorithm for optimization problems, in 2012 Fifth International Symposium on Computational Intelligence and Design, (2012), 344–347.
    [43] N. A. Baghdadi, S. M. F. Abdelaliem, A. Malki, I. Gad, A. Ewis, E. Atlam, Advanced machine learning techniques for cardiovascular disease early detection and diagnosis, J. Big Data, 10 (2023). doi: 10.1186/s40537-023-00817-1
    [44] H. Huan, F. Zhen, L. Hai, J. Cheng, J. Lyu, Y. Zhang, et al., Gene function and cell surface protein association analysis based on single-cell multiomics data, Comput. Biol. Med., 157 (2023), 106733. doi: 10.1016/j.compbiomed.2023.106733
    [45] R. Meng, S. Yin, J. Sun, H. Hu, Q Zhao, ScAAGA: Single cell data analysis framework using asymmetric autoencoder with gene attention, Comput. Biol. Med., 165 (2023), 107414. doi: 10.1016/j.compbiomed.2023.107414
    [46] H. Gao, J. Sun, Y. Wang, Y. Lu, L. Liu, Q. Zhao, et al., Predicting metabolite–disease associations based on auto-encoder and non-negative matrix factorization, Briefings Bioinf., 24 (2023), bbad259. doi: 10.1093/bib/bbad259
    [47] W. Wang, L. Zhang, J. Sun, Q. Zhao, J. Shuai, Predicting the potential human lncRNA–miRNA interactions based on graph convolution network with conditional random field, Briefings Bioinf., 23 (2022), bbac463. doi: 10.1093/bib/bbac463
    [48] L. Zhang, P. Yang, H. Feng, Q. Zhao, H. Liu, Using network distance analysis to predict lncRNA–miRNA interactions, Interdiscip. Sci. Comput. Life Sci., 13 (2021), 535–545. doi: 10.1007/s12539-021-00458-z
    [49] F. Sun, J. Sun, Q. Zhao, A deep learning method for predicting metabolite–disease associations via graph neural network, Briefings Bioinf., 23 (2022), bbac266. doi: 10.1093/bib/bbac266
    [50] T. Wang, J. Sun, Q. Zhao, Investigating cardiotoxicity related with hERG channel blockers using molecular fingerprints and graph attention mechanism, Comput. Biol. Med., 153 (2023), 106464. doi: 10.1016/j.compbiomed.2022.106464
    [51] Z. Chen, L. Zhang, J. Sun, R. Meng, S. Yin, Q. Zhao, DCAMCP: A deep learning model based on capsule network and attention mechanism for molecular carcinogenicity prediction, J. Cell Mol. Med., 27 (2023), 3117–3126. doi: 10.1111/jcmm.17889
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (
通讯作者: 陈斌,
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索


Article views(1530) PDF downloads(70) Cited by(0)

Article outline

Figures and Tables

Figures(5)  /  Tables(12)

Other Articles By Authors


DownLoad:  Full-Size Img  PowerPoint
