Respiratory diseases represent one of the most significant economic burdens on healthcare systems worldwide. The variation in the increasing number of cases depends greatly on climatic seasonal effects, socioeconomic factors, and pollution. Therefore, understanding these variations and obtaining precise forecasts allows health authorities to make correct decisions regarding the allocation of limited economic and human resources. We aimed to model and forecast weekly hospitalizations due to respiratory conditions in seven regional hospitals in Costa Rica using four statistical learning techniques (Random Forest, XGboost, Facebook's Prophet forecasting model, and an ensemble method combining the above methods), along with 22 climate change indices and aerosol optical depth as an indicator of pollution. Models were trained using data from 2000 to 2018 and were evaluated using data from 2019 as testing data. During the training period, we set up 2-year sliding windows and a 1-year assessment period, along with the grid search method to optimize hyperparameters for each model. The best model for each region was selected using testing data, based on predictive precision and to prevent overfitting. Prediction intervals were then computed using conformal inference. The relative importance of all climatic variables was computed for the best model, and similar patterns in some of the seven regions were observed based on the selected model. Finally, reliable predictions were obtained for each of the seven regional hospitals.
Citation: Shu Wei Chou-Chen, Luis A. Barboza. Forecasting hospital discharges for respiratory conditions in Costa Rica using climate and pollution data[J]. Mathematical Biosciences and Engineering, 2024, 21(7): 6539-6558. doi: 10.3934/mbe.2024285
Respiratory diseases represent one of the most significant economic burdens on healthcare systems worldwide. The variation in the increasing number of cases depends greatly on climatic seasonal effects, socioeconomic factors, and pollution. Therefore, understanding these variations and obtaining precise forecasts allows health authorities to make correct decisions regarding the allocation of limited economic and human resources. We aimed to model and forecast weekly hospitalizations due to respiratory conditions in seven regional hospitals in Costa Rica using four statistical learning techniques (Random Forest, XGboost, Facebook's Prophet forecasting model, and an ensemble method combining the above methods), along with 22 climate change indices and aerosol optical depth as an indicator of pollution. Models were trained using data from 2000 to 2018 and were evaluated using data from 2019 as testing data. During the training period, we set up 2-year sliding windows and a 1-year assessment period, along with the grid search method to optimize hyperparameters for each model. The best model for each region was selected using testing data, based on predictive precision and to prevent overfitting. Prediction intervals were then computed using conformal inference. The relative importance of all climatic variables was computed for the best model, and similar patterns in some of the seven regions were observed based on the selected model. Finally, reliable predictions were obtained for each of the seven regional hospitals.
[1] | Forum of International Respiratory Societies, Respiratory diseases in the world. Realities of today – opportunities for tomorrow, Technical Report 9, 2014. |
[2] | Y. Chen, D. Kong, J. Fu, Y. Zhang, Y. Zhao, Y. Liu, et al., Associations between ambient temperature and adult asthma hospitalizations in Beijing, China: A time-stratified case-crossover study, Respir. Res., 23 (2022), 38. https://doi.org/10.1186/s12931-022-01960-8 doi: 10.1186/s12931-022-01960-8 |
[3] | S. Lin, M. Luo, R. J. Walker, X. Liu, S.-A. Hwang, R. Chinery, Extreme high temperatures and hospital admissions for respiratory and cardiovascular diseases, Epidemiology, 20 (2009), 738–746. https://doi.org/10.1097/EDE.0b013e3181ad5522 doi: 10.1097/EDE.0b013e3181ad5522 |
[4] | H. Pedder, T. Kapwata, G. Howard, R. N. Naidoo, Z. Kunene, R. W. Morris, et al., Lagged Association between Climate Variables and Hospital Admissions for Pneumonia in South Africa, Int. J. Env. Res. Pub. He., 18 (2021). https://doi.org/10.3390/ijerph18126191 doi: 10.3390/ijerph18126191 |
[5] | Q. A. Tran, V. T. H. Le, V. T. Ngo, T. H. Le, D. T. Phung, J. D. Berman, et al., The Association Between Ambient Temperatures and Hospital Admissions Due to Respiratory Diseases in the Capital City of Vietnam, Front. Pub. Hea., 10 (2022). https://doi.org/10.3389/fpubh.2022.903623 doi: 10.3389/fpubh.2022.903623 |
[6] | H. Tsangari, A. K. Paschalidou, A. P. Kassomenos, S. Vardoulakis, C. Heaviside, K. E. Georgiou, et al., Extreme weather and air pollution effects on cardiovascular and respiratory hospital admissions in Cyprus, Sci. Total Environ., 542 (2016), 247–253. https://doi.org/10.1016/j.scitotenv.2015.10.106 doi: 10.1016/j.scitotenv.2015.10.106 |
[7] | Y. Guo, A. Gasparrini, B. G. Armstrong, B. Tawatsupa, A. Tobias, E. Lavigne, et al., Temperature Variability and Mortality: A Multi-Country Study, Environ. Health Persp., 124 (2016), 1554–1559. https://doi.org/10.1289/ehp149 doi: 10.1289/ehp149 |
[8] | H. Kan, R. Chen, S. Tong, Ambient air pollution, climate change, and population health in China, Environ. Int., 42 (2012), 10–19. https://doi.org/10.1016/j.envint.2011.03.003 doi: 10.1016/j.envint.2011.03.003 |
[9] | C. Liu, R. Chen, F. Sera, A. M. Vicedo-Cabrera, Y. Guo, S. Tong, et al., Ambient Particulate Air Pollution and Daily Mortality in 652 Cities, New Engl. J. Med., 381 (2019), 705–715. https://doi.org/10.1056/NEJMoa1817364 doi: 10.1056/NEJMoa1817364 |
[10] | J. Lou, Y. Wu, P. Liu, S. H. Kota, L. Huang, Health Effects of Climate Change Through Temperature and Air Pollution, Curr. Poll. Rep., 5 (2019), 144–158. https://doi.org/10.1007/s40726-019-00112-9 doi: 10.1007/s40726-019-00112-9 |
[11] | S. G. E. K. Miraglia, P. H. N. Saldiva, G. M. Böhm, An Evaluation of Air Pollution Health Impacts and Costs in São Paulo, Brazil, Environ. Manage., 35 (2005), 667–676. https://doi.org/10.1007/s00267-004-0042-9 doi: 10.1007/s00267-004-0042-9 |
[12] | F. Makrufardi, A. Manullang, D. Rusmawatiningtyas, K. F. Chung, S.-C. Lin, H.-C. Chuang, Extreme weather and asthma: A systematic review and meta-analysis, Eur. Respir. Rev., 32, https://doi.org/10.1183/16000617.0019-2023 |
[13] | G. B. Hamra, N. Guha, A. Cohen, F. Laden, O. Raaschou-Nielsen, et al., Outdoor particulate matter exposure and lung cancer: A systematic review and meta-analysis, Environ. Health Persp., 122 (2014), 906–911. https://doi.org/10.1289/ehp/1408092 doi: 10.1289/ehp/1408092 |
[14] | I. Colbeck, M. Lazaridis, Aerosols and environmental pollution, Naturwissenschaften, 97 (2010), 117–131. https://doi.org/10.1007/s00114-009-0594-x doi: 10.1007/s00114-009-0594-x |
[15] | S. Provençal, P. Kishcha, A. M. d. Silva, E. Elhacham, P. Alpert, AOD distributions and trends of major aerosol species over a selection of the world's most populated cities based on the 1st Version of NASA's MERRA Aerosol Reanalysis, Urban climate, 20 (2017), 168, https://doi.org/10.1016/j.uclim.2017.04.001 doi: 10.1016/j.uclim.2017.04.001 |
[16] | W. Zhang, Q. He, H. Wang, K. Cao and S. He, Factor analysis for aerosol optical depth and its prediction from the perspective of land-use change, Ecol. Indic., 93 (2018), 458–469, https://doi.org/10.1016/j.ecolind.2018.05.026 doi: 10.1016/j.ecolind.2018.05.026 |
[17] | L. Li, Y. Wang, What drives the aerosol distribution in Guangdong - the most developed province in Southern China?, Sci. Rep-UK, 4 (2014), 5972, https://doi.org/10.1038/srep05972 doi: 10.1038/srep05972 |
[18] | Organización Panamericana de la Salud, Perfil del sistema y servicios de salud de Costa Rica con base al marco de monitoreo de la Estrategia Regional de Salud Universal, Place: San José. |
[19] | M. Baird, (In)equity and primary health care: The case of Costa Rica and Panama, Int. J. Soc. Det. He. Ser., 53 (2023), 122–129. https:/doi.org/10.1177/27551938231152991 doi: 10.1177/27551938231152991 |
[20] | H. Jia, J. Xu, L. Ning, T. Feng, P. Cao, S. Gao, et al., Ambient air pollution, temperature and hospital admissions due to respiratory diseases in a cold, industrial city, J. Glob. Health, 12 (2022), 4085. https://doi.org/10.7189/jogh.12.04085 doi: 10.7189/jogh.12.04085 |
[21] | J. Peng, C. Chen, M. Zhou, X. Xie, Y. Zhou, C.-H. Luo, Peak outpatient and emergency department visit forecasting for patients with chronic respiratory diseases using machine learning methods: Retrospective cohort study, JMIR Medical Inform., 8 (2020), e13075. https://doi.org/10.2196/13075 doi: 10.2196/13075 |
[22] | Y. de Souza Tadano, H. V. Siqueira, T. A. Alves, Unorganized machines to predict hospital admissions for respiratory diseases, in 2016 IEEE Latin American Conference on Computational Intelligence (LA-CCI), (2016), 1–6. https://doi.org/10.1109/LA-CCI.2016.7885699 |
[23] | I. N. Soyiri, D. D. Reidpath, C. Sarran, Forecasting asthma-related hospital admissions in london using negative binomial models, Chron. Resp. Dis., 10 (2013), 85–94. https://doi.org/10.1177/1479972313482847 doi: 10.1177/1479972313482847 |
[24] | P. Kassomenos, C. Papaloukas, M. Petrakis, S. Karakitsios, Assessment and prediction of short term hospital admissions: the case of athens, greece, Atmos. Environ., 42 (2008), 7078–7086. https://doi.org/10.1016/j.atmosenv.2008.06.011 doi: 10.1016/j.atmosenv.2008.06.011 |
[25] | M. Becerra, A. Jerez, B. Aballay, H. O. Garcés, A. Fuentes, Forecasting emergency admissions due to respiratory diseases in high variability scenarios using time series: A case study in chile, Sci. Total Environ., 706 (2020), 134978, https://doi.org/10.1016/j.scitotenv.2019.134978 doi: 10.1016/j.scitotenv.2019.134978 |
[26] | E. K. Poon, V. Kitsios, D. Pilcher, R. Bellomo, J. Raman, Projecting future climate impact on national australian respiratory-related intensive care unit demand, Heart Lung Circ., 32 (2023), 95–104. https://doi.org/10.1016/j.hlc.2022.12.001 doi: 10.1016/j.hlc.2022.12.001 |
[27] | K. Ravindra, S. S. Bahadur, V. Katoch, S. Bhardwaj, M. Kaur-Sidhu, M. Gupta, et al., Application of machine learning approaches to predict the impact of ambient air pollution on outpatient visits for acute respiratory infections, Sci. Total Environ., 858 (2023), 159509. https://doi.org/10.1016/j.scitotenv.2022.159509 doi: 10.1016/j.scitotenv.2022.159509 |
[28] | Y. Odeyemi, A. Lal, E. Barreto, A. LeMahieu, H. Yadav, O. Gajic, et al., Early machine learning prediction of hospitalized patients at low risk of respiratory deterioration or mortality in community-acquired pneumonia: Derivation and validation of a multivariable model, Biomol. Biomed., 24 (2024), 337–345. https://doi.org/10.17305/bb.2023.9754 doi: 10.17305/bb.2023.9754 |
[29] | J. Yang, X. Xu, X. Ma, Z. Wang, Q. You, W. Shan, et al., Application of machine learning to predict hospital visits for respiratory diseases using meteorological and air pollution factors in linyi, china, Environ. Sci. Pollut. R., 30 (2023), 88431–88443, https://doi.org/10.1007/s11356-023-28682-8 doi: 10.1007/s11356-023-28682-8 |
[30] | ICD-10-CM, J00-J99: Diseases of the respiratory system, Available online at https://www.icd10data.com/ICD10CM/Codes/J00-J99 (accessed on September 10, 2022). |
[31] | CCSS, Estadística en salud, Available online at https://www.ccss.sa.cr/estadisticas-salud (accessed on October 16, 2023). |
[32] | C. Funk, P. Peterson, M. Landsfeld, D. Pedreros, J. Verdin, S. Shukla, et al., The climate hazards infrared precipitation with stations–a new environmental record for monitoring extremes, Sci. Data, 2 (2015), 150066–150066. https://doi.org/10.1038/sdata.2015.66 doi: 10.1038/sdata.2015.66 |
[33] | C. Funk, P. Peterson, S. Peterson, S. Shukla, F. Davenport, J. Michaelsen, et al., A high-resolution 1983–2016 tmax climate data record based on infrared temperatures and stations by the climate hazard center, J. Climate, 32 (2019), 5639–5658. https://doi.org/10.1175/JCLI-D-18-0698.1 doi: 10.1175/JCLI-D-18-0698.1 |
[34] | T. R. Karl, N. Nicholls, A. Ghazi, CLIVAR/GCOS/WMO Workshop on Indices and Indicators for Climate Extremes Workshop Summary, 3–7, Springer Netherlands, Dordrecht, (1999). https://doi.org/10.1007/978-94-015-9265-9_2 |
[35] | MODIS Atmosphere Science Team, Modis/terra aerosol cloud water vapor ozone daily l3 global 1deg cmg, 2017. |
[36] | L. Breiman, Random forests, Mach. learning., 45 (2001), 5–32. https://doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324 |
[37] | T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition, Springer Series in Statistics, Springer (2009). https://doi.org/10.1007/978-0-387-84858-7 |
[38] | T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, Association for Computing Machinery, New York, NY, USA, (2016) 785–794. https://doi.org/10.1145/2939672.2939785 |
[39] | S. J. Taylor, B. Letham, Forecasting at scale, Am. Stat., 72 (2018), 37–45. https://doi.org/10.1080/00031305.2017.1380080 doi: 10.1080/00031305.2017.1380080 |
[40] | A. Gramaje, F. Thabtah, N. Abdelhamid, S. K. Ray, Patient discharge classification using machine learning techniques, Ann. Data Sci., 8 (2021), 755–767. https://doi.org/10.1007/s40745-019-00223-6 doi: 10.1007/s40745-019-00223-6 |
[41] | J. H. Chung, D. Cannon, M. Gulbrandsen, D. Yalamanchili, W. P. Phipatanakul, J. Liu, et al., Random forest identifies predictors of discharge destination following total shoulder arthroplasty, JSES Int., 8 (2024), 317–321. https://doi.org/10.1016/j.jseint.2023.04.003 doi: 10.1016/j.jseint.2023.04.003 |
[42] | R. Chen, S. Zhang, J. Li, D. Guo, W. Zhang, X. Wang, et al., A study on predicting the length of hospital stay for chinese patients with ischemic stroke based on the XGBoost algorithm, BMC Med. Inform. Decis., 23 (2023). https://doi.org/10.1186/s12911-023-02140-4 doi: 10.1186/s12911-023-02140-4 |
[43] | Y.-T. Lo, J. C. Liao, M.-H. Chen, C.-M. Chang, C.-T. Li, Predictive modeling for 14-day unplanned hospital readmission risk by using machine learning algorithms, BMC Med. Inform. Decis., 21 (2021). https://doi.org/10.1186/s12911-021-01639-y doi: 10.1186/s12911-021-01639-y |
[44] | T. H. McCoy Jr, A. M. Pellegrini, R. H. Perlis, Assessment of time-series machine learning methods for forecasting hospital discharge volume, JAMA Netw. Open, 1 (2018), e184087. https://doi.org/10.1001/jamanetworkopen.2018.4087 doi: 10.1001/jamanetworkopen.2018.4087 |
[45] | H. Álvarez-Chaves, P. Muñoz, M. D. R-Moreno, Machine learning methods for predicting the admissions and hospitalisations in the emergency department of a civil and military hospital, J. Intell. Inf. Syst., 61 (2023), 881–900. https://doi.org/10.1007/s10844-023-00790-4 doi: 10.1007/s10844-023-00790-4 |
[46] | J. Wolff, A. Klimke, M. Marschollek, T. Kacprowski, Forecasting admissions in psychiatric hospitals before and during covid-19: A retrospective study with routine data, Sci. Rep., 12 (2022), 1–9. https://doi.org/10.1038/s41598-022-20190-y doi: 10.1038/s41598-022-20190-y |
[47] | S. A. Suha, T. F. Sanam, A machine learning approach for predicting patient's length of hospital stay with random forest regression, in 2022 IEEE Region 10 Symposium (TENSYMP), (2022) 1–6. https://doi.org/10.1109/TENSYMP54529.2022.9864447 |
[48] | H. Yun, J. Choi, J. H. Park, Prediction of critical care outcome for adult patients presenting to emergency department using initial triage information: An XGboost algorithm analysis, JMIR Med. Inform., 9 (2021), e30770. https://doi.org/10.2196/30770 doi: 10.2196/30770 |
[49] | L. C. P. Velasco, N. R. Estoperez, R. J. R. Jayson, C. J. T. Sabijon, V. C. Sayles, Day-ahead base, intermediate, and peak load forecasting using k-means and artificial neural networks, IJACSA, 9 (2018). https://doi.org/10.14569/IJACSA.2018.090210 doi: 10.14569/IJACSA.2018.090210 |
[50] | L. C. P. Velasco, D. L. L. Polestico, G. P. O. Macasieb, M. B. V. Reyes, F. B. Vasquez Jr, A hybrid model of autoregressive integrated moving average and artificial neural network for load forecasting, IJACSA, 10 (2019), 14–16. https://doi.org/10.14569/IJACSA.2019.0101103 doi: 10.14569/IJACSA.2019.0101103 |
[51] | M. Kuhn, H. Wickham, Tidymodels: A collection of packages for modeling and machine learning using tidyverse principles., (2020). |
[52] | M. Dancho, modeltime: The Tidymodels Extension for Time Series Modeling, (2023). R package version 1.2.8. |
[53] | F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12 (2011), 2825–2830. |
[54] | J. Lei, M. G'Sell, A. Rinaldo, R. J. Tibshirani, L. Wasserman, Distribution-free predictive inference for regression, J. Am Stat. Assoc., 113 (2018), 1094–1111. https://doi.org/10.1080/01621459.2017.1307116 doi: 10.1080/01621459.2017.1307116 |
[55] | R. Hyndman, G. Athanasopoulos, Forecasting: Principles and practice, 3rd edition,, 3rd edition, OTexts, Melbourne, Australia, (2021). |
[56] | T. Gneiting, A. E. Raftery, Strictly Proper Scoring Rules, Prediction, and Estimation, J. Am Stat. Assoc., 102 (2007), 359–378. https://doi.org/10.1198/016214506000001437 doi: 10.1198/016214506000001437 |
[57] | B. M. Greenwell, B. C. Boehmke, Variable importance plots–-an introduction to the vip package, R J., 12 (2020), 343–366. https://doi.org/10.32614/RJ-2020-013 doi: 10.32614/RJ-2020-013 |
[58] | R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, (2023). |
[59] | M. N. Wright, A. Ziegler, ranger: A fast implementation of random forests for high dimensional data in C++ and R, J. Stat Softw., 77 (2017), 1–17. https://doi.org/10.18637/jss.v077.i01 doi: 10.18637/jss.v077.i01 |
[60] | T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, et al., xgboost: Extreme Gradient Boosting, (2023). R package version 1.7.5.1. |
[61] | M. Dancho, modeltime.ensemble: Ensemble Algorithms for Time Series Forecasting with Modeltime, (2023). R package version 1.0.3. |
[62] | S. Taylor, B. Letham, prophet: Automatic Forecasting Procedure, (2021). R package version 1.0. |
[63] | S. Chaturvedi, E. Rajasekar, S. Natarajan, N. McCullen, A comparative assessment of SARIMA, LSTM RNN and fb prophet models to forecast total and peak monthly energy demand for india, Energ. Policy, 168 (2022), 113097. https://doi.org/10.1016/j.enpol.2022.113097 doi: 10.1016/j.enpol.2022.113097 |
[64] | M. Elseidi, A hybrid facebook Prophet-ARIMA framework for forecasting high-frequency temperature data, Model. Earth Syst. Environ., 10 (2024), 1855–1867. https://doi.org/10.1007/s40808-023-01874-4 doi: 10.1007/s40808-023-01874-4 |
[65] | D. Royé, J. J. Taboada, A. Martí, M. N. Lorenzo, Winter circulation weather types and hospital admissions for respiratory diseases in galicia, Spain, Int. J. Biometeorol., 60 (2016), 507–520. https://doi.org/10.1007/s00484-015-1047-1 doi: 10.1007/s00484-015-1047-1 |
[66] | R. Navares, J. Díaz, C. Linares, J. L. Aznarte, Comparing ARIMA and computational intelligence methods to forecast daily hospital admissions due to circulatory and respiratory causes in madrid, Stoch. Env. Res. Risk A., 32 (2018), 2849–2859. https://doi.org/10.1007/s00477-018-1519-z doi: 10.1007/s00477-018-1519-z |