Research article

Association of blood heavy metals with risk of stroke: A machine learning-based study from NHANES 2017-2020

  • Published: 18 December 2025
  • Stroke is a condition in which the brain and spinal cord are damaged by abnormal blood supply and is one of the leading causes of disability and death worldwide. In order to meet the urgent public health needs arising from the rapidly increasing incidence of stroke, it is important to predict and diagnose stroke in advance. Environmental heavy metals have been implicated in the risk of stroke, but their role in early prediction is still understudied. The purpose of this study is to incorporate the heavy metal content in environmental exposure into the stroke risk prediction model and to analyze the stroke association of these heavy metal characteristics. The main research contents of this paper are as follows: First, the data were extracted from the NHANES database, and feature vectorization, K-nearest neighbor imputation, and normalization were performed. Due to the imbalance of data classes, five methods were used to deal with the problem, and the results show that the cost-sensitive method has the best effect, with an accuracy of 0.98. Through feature selection and correlation analysis, 14-dimensional important features were selected, including blood lead and blood manganese, two heavy metals. Second, this paper evaluates and compares a variety of traditional and ensemble machine learning models, such as random forest (RF), gradient-boosted decision trees (GDBT) and XGBoost. The results showed that the random forest model performed the best, with an accuracy of 0.96, a percision of 0.93, a recall rate of 0.98, and an F1 score of 0.95. Combined with Shapley additive explanations (SHAP) theory, the prediction results were explained and the influence of each feature on the prediction results was analyzed. The results showed that blood lead level was significantly associated with stroke (95% confidence interval: 0.227-0.522; p-value < 0.001), while blood manganese was significantly negatively correlated with stroke risk (95% confidence interval: 0.131-0.022; p-value < 0.006). In addition, the above results were verified by plotting the dose-response curve. These findings suggest that environmental heavy metal exposure has important value in stroke prediction.

    Citation: KeXin Li, FengQi Liu, Enxiao Zhu. Association of blood heavy metals with risk of stroke: A machine learning-based study from NHANES 2017-2020[J]. Big Data and Information Analytics, 2025, 9: 400-426. doi: 10.3934/bdia.2025020

    Related Papers:

  • Stroke is a condition in which the brain and spinal cord are damaged by abnormal blood supply and is one of the leading causes of disability and death worldwide. In order to meet the urgent public health needs arising from the rapidly increasing incidence of stroke, it is important to predict and diagnose stroke in advance. Environmental heavy metals have been implicated in the risk of stroke, but their role in early prediction is still understudied. The purpose of this study is to incorporate the heavy metal content in environmental exposure into the stroke risk prediction model and to analyze the stroke association of these heavy metal characteristics. The main research contents of this paper are as follows: First, the data were extracted from the NHANES database, and feature vectorization, K-nearest neighbor imputation, and normalization were performed. Due to the imbalance of data classes, five methods were used to deal with the problem, and the results show that the cost-sensitive method has the best effect, with an accuracy of 0.98. Through feature selection and correlation analysis, 14-dimensional important features were selected, including blood lead and blood manganese, two heavy metals. Second, this paper evaluates and compares a variety of traditional and ensemble machine learning models, such as random forest (RF), gradient-boosted decision trees (GDBT) and XGBoost. The results showed that the random forest model performed the best, with an accuracy of 0.96, a percision of 0.93, a recall rate of 0.98, and an F1 score of 0.95. Combined with Shapley additive explanations (SHAP) theory, the prediction results were explained and the influence of each feature on the prediction results was analyzed. The results showed that blood lead level was significantly associated with stroke (95% confidence interval: 0.227-0.522; p-value < 0.001), while blood manganese was significantly negatively correlated with stroke risk (95% confidence interval: 0.131-0.022; p-value < 0.006). In addition, the above results were verified by plotting the dose-response curve. These findings suggest that environmental heavy metal exposure has important value in stroke prediction.



    加载中


    [1] Powers WJ, (2020) Acute ischemic stroke. N Engl J Med 383: 252-260. https://doi.org/10.1056/NEJMcp1917030 doi: 10.1056/NEJMcp1917030
    [2] A Montaño, DF Hanley, JC Hemphill III, (2021) Hemorrhagic stroke. Handb Clin Neurol 176: 229-248. https://doi.org/10.1016/B978-0-444-64034-5.00019-5 doi: 10.1016/B978-0-444-64034-5.00019-5
    [3] Johnston SC, (2002) Transient ischemic attack. N Engl J Med 347: 1687-1692. https://doi.org/10.1056/NEJMcp020891 doi: 10.1056/NEJMcp020891
    [4] RL Sacco, SE Kasner, JP Broderick, LR Caplan, JJ Connors, A Culebras, et al. (2013) An updated definition of stroke for the 21st century: A statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke 44: 2064-2089. https://doi.org/10.1161/STR.0b013e318296aeca doi: 10.1161/STR.0b013e318296aeca
    [5] GBD 2016 Stroke Collaborators, (2019) Global, regional, and national burden of stroke, 1990-2016: A systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol 18: 439-458. https://doi.org/10.1016/S1474-4422(19)30034-1 doi: 10.1016/S1474-4422(19)30034-1
    [6] Feigin VL, Brainin M, Norrving B, Martins S, Sacco RL, Hacke W, et al. (2022) World Stroke Organization (WSO): Global stroke fact sheet 2022. Int J Stroke 17: 18-29. https://doi.org/10.1177/17474930211065917 doi: 10.1177/17474930211065917
    [7] Zuo W, Yang X, (2024) A machine learning model predicts stroke associated with blood cadmium levels. Sci Rep 14: 14739. https://doi.org/10.1038/s41598-024-65633-w doi: 10.1038/s41598-024-65633-w
    [8] Rehman K, Fatima F, Waheed I, Akash MSH, Prevalence of exposure of heavy metals and their impact on health consequences. J Cell Biochem 119: 157-184.https://doi.org/10.1002/jcb.26234
    [9] Jarup L, (2003) Hazards of heavy metal contamination. Br Med Bull 68: 167-182. https://doi.org/10.1093/bmb/ldg032 doi: 10.1093/bmb/ldg032
    [10] M Jaishankar, T Tseten, N Anbalagan, Mathew BB, Beeregowda KN, (2014) Toxicity, mechanism and health effects of some heavy metals. Interdiscip Toxicol 7: 60-72. https://doi.org/10.2478/intox-2014-0009 doi: 10.2478/intox-2014-0009
    [11] Amini L, Azarpazhouh R, Farzadfar MT, Mousavi SA, Jazaieri F, Khorvash F, et al. Prediction and control of stroke by data mining. Int J Prev Med 4: S245.
    [12] Sung SF, Hsieh CY, Yang YHK, Lin HJ, Chen CH, Chen YW, et al. (2015) Developing a stroke severity index based on administrative data was feasible using data mining techniques. J Clin Epidemiol 68: 1292-1300. https://doi.org/10.1016/j.jclinepi.2015.01.009 doi: 10.1016/j.jclinepi.2015.01.009
    [13] Adam SY, Yousif A, Bashir MB, (2016) Classification of ischemic stroke using machine learning algorithms. Int J Comput Appl 149: 26-31. https://doi.org/10.5120/ijca2016911607 doi: 10.5120/ijca2016911607
    [14] Sailasya G, Kumari GLA, (2021) Analyzing the performance of stroke prediction using ML classification algorithms. Int J Adv Comput Sci Appl 12. https://doi.org/10.14569/IJACSA.2021.0120662 doi: 10.14569/IJACSA.2021.0120662
    [15] Dev S, Wang H, Nwosu CS, Jain N, Veeravalli B, John D, (2022) A predictive analytics approach for stroke prediction using machine learning and neural networks. Healthcare Anal 2: 100032. https://doi.org/10.1016/j.health.2022.100032 doi: 10.1016/j.health.2022.100032
    [16] Cao Z, Bakulski KM, Paulson HL, Wang X, (2023) Exposure to heavy metals, obesity, and stroke mortality in the United States, preprint, MedRxiv, 2023 Sep 18: 2023.09. 18.23295722.https://doi.org/10.1101/2023.09.18.23295722
    [17] Menke A, Muntner P, Batuman V, Silbergeld EK, Guallar E, (2006) Blood lead below 0.48 μmol/L (10 μg/dL) and mortality among US adults. Circulation 114: 1388-1394.https://doi.org/10.1161/CIRCULATIONAHA.106.628321
    [18] Meishuo O, Eshak ES, Muraki I, Cui R, Shirai K, Iso H, et al. (2022) Association between dietary manganese intake and mortality from cardiovascular disease in Japanese population: The Japan collaborative cohort study. J Atheroscler Thromb 29: 1432-1447. https://doi.org/10.5551/jat.63195 doi: 10.5551/jat.63195
    [19] Curtin LR, Mohadjer LK, Dohrmann SM, Montaquila JM, Kruszan-Moran D, Mirel LB, et al. The national health and nutrition examination survey: Sample design, 1999-2006. Vital Health Stat Ser 2 Data Eval Methods Res 155: 1-39.
    [20] Johnson CL, Dohrmann SM, Burt VL, Mohadjer LK, (2014) National health and nutrition examination survey: Sample design, 2011-2014. Vital Health Stat Ser 2 Data Eval Methods Res 2014: 1-33.
    [21] Chen TC, Clark J, Riddles MK, Mohadjer LK, Fakhouri THI, (2020) National Health and Nutrition Examination Survey, 2015-2018: Sample design and estimation procedures. Vital Health Stat Ser 2 Data Eval Methods Res 2020: 1-35.
    [22] Bengio Y, Courville A, Vincent P, (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35: 1798-1828. https://doi.org/10.1109/TPAMI.2013.50 doi: 10.1109/TPAMI.2013.50
    [23] Hong S, Lynn HS, (2020) Accuracy of random-forest-based imputation of missing data in the presence of non-normality, nonlinearity, and interaction. BMC Med Res Methodol 20: 1-12. https://doi.org/10.1186/s12874-020-01080-1 doi: 10.1186/s12874-020-01080-1
    [24] Yu L, Liu H, (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5: 1205-1224.
    [25] Oh IS, Lee JS, Moon BR, (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26: 1424-1437. https://doi.org/10.1109/TPAMI.2004.105 doi: 10.1109/TPAMI.2004.105
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(25) PDF downloads(2) Cited by(0)

Article outline

Figures and Tables

Figures(20)  /  Tables(8)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog