In the absence of pharmaceutical interventions, social distancing and lockdown have been key options for controlling new or reemerging respiratory infectious disease outbreaks. The timely implementation of these interventions is vital for effectively controlling and safeguarding the economy.Motivated by the COVID-19 pandemic, we evaluated whether, when, and to what level lockdowns are necessary to minimize epidemic and economic burdens of new disease outbreaks. We formulated the question as a sequential decision-making Markov Decision Process and solved it using deep Q-network algorithm. We evaluated the question under two objective functions: a 2-objective function to minimize economic burden and hospital capacity violations, suitable for diseases with severe health risks but with minimal death, and a 3-objective function that additionally minimizes the number of deaths, suitable for diseases that have high risk of mortality.A key feature of the model is that we evaluated the above questions in the context of two-geographical jurisdictions that interact through travel but make autonomous and independent decisions, evaluating under cross-jurisdictional cooperation and non-cooperation. In the 2-objective function under cross-jurisdictional cooperation, the optimal policy was to aim for shutdowns at 50 and 25% per day. Though this policy avoided hospital capacity violations, the shutdowns extended until a large proportion of the population reached herd immunity. Delays in initiating this optimal policy or non-cooperation from an outside jurisdiction required shutdowns at a higher level of 75% per day, thus adding to economic burdens. In the 3-objective function, the optimal policy under cross-jurisdictional cooperation was to aim for shutdowns of up to 75% per day to prevent deaths by reducing infected cases. This optimal policy continued for the entire duration of the simulation, suggesting that, until pharmaceutical interventions such as treatment or vaccines become available, contact reductions through physical distancing would be necessary to minimize deaths. Deviating from this policy increased the number of shutdowns and led to several deaths.In summary, we present a decision-analytic methodology for identifying optimal lockdown strategy under the context of interactions between jurisdictions that make autonomous and independent decisions. The numerical analysis outcomes are intuitive and, as expected, serve as proof of the feasibility of such a model. Our sensitivity analysis demonstrates that the optimal policy exhibits robustness to minor alterations in the transmission rate, yet shows sensitivity to more substantial deviations. This finding underscores the dynamic nature of epidemic parameters, thereby emphasizing the necessity for models trained across a diverse range of values to ensure effective policy-making.
Citation: Seyedeh Nazanin Khatami, Chaitra Gopalappa. Deep reinforcement learning framework for controlling infectious disease outbreaks in the context of multi-jurisdictions[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 14306-14326. doi: 10.3934/mbe.2023640
In the absence of pharmaceutical interventions, social distancing and lockdown have been key options for controlling new or reemerging respiratory infectious disease outbreaks. The timely implementation of these interventions is vital for effectively controlling and safeguarding the economy.Motivated by the COVID-19 pandemic, we evaluated whether, when, and to what level lockdowns are necessary to minimize epidemic and economic burdens of new disease outbreaks. We formulated the question as a sequential decision-making Markov Decision Process and solved it using deep Q-network algorithm. We evaluated the question under two objective functions: a 2-objective function to minimize economic burden and hospital capacity violations, suitable for diseases with severe health risks but with minimal death, and a 3-objective function that additionally minimizes the number of deaths, suitable for diseases that have high risk of mortality.A key feature of the model is that we evaluated the above questions in the context of two-geographical jurisdictions that interact through travel but make autonomous and independent decisions, evaluating under cross-jurisdictional cooperation and non-cooperation. In the 2-objective function under cross-jurisdictional cooperation, the optimal policy was to aim for shutdowns at 50 and 25% per day. Though this policy avoided hospital capacity violations, the shutdowns extended until a large proportion of the population reached herd immunity. Delays in initiating this optimal policy or non-cooperation from an outside jurisdiction required shutdowns at a higher level of 75% per day, thus adding to economic burdens. In the 3-objective function, the optimal policy under cross-jurisdictional cooperation was to aim for shutdowns of up to 75% per day to prevent deaths by reducing infected cases. This optimal policy continued for the entire duration of the simulation, suggesting that, until pharmaceutical interventions such as treatment or vaccines become available, contact reductions through physical distancing would be necessary to minimize deaths. Deviating from this policy increased the number of shutdowns and led to several deaths.In summary, we present a decision-analytic methodology for identifying optimal lockdown strategy under the context of interactions between jurisdictions that make autonomous and independent decisions. The numerical analysis outcomes are intuitive and, as expected, serve as proof of the feasibility of such a model. Our sensitivity analysis demonstrates that the optimal policy exhibits robustness to minor alterations in the transmission rate, yet shows sensitivity to more substantial deviations. This finding underscores the dynamic nature of epidemic parameters, thereby emphasizing the necessity for models trained across a diverse range of values to ensure effective policy-making.
[1] | S. Thomson, E. C. Ip, COVID-19 emergency measures and the impending authoritarian pandemic, J. Law Biosci., 7 (2020). 1–13. https://doi.org/10.1093/jlb/lsaa064 doi: 10.1093/jlb/lsaa064 |
[2] | A. L. Bertozzi, E. Franco, G. Mohler, M. B. Short, D. Sledge, The challenges of modeling and forecasting the spread of COVID-19, Proc. Natl. Acad. Sci. U.S.A., 117 (2020), 16732–16738. https://doi.org/10.1073/pnas.2006520117 doi: 10.1073/pnas.2006520117 |
[3] | T. Oraby, M. G. Tyshenko, J. C. Maldonado, K. Vatcheva, S. Elsaadany, W. Q. Alali, et al., Modeling the effect of lockdown timing as a COVID-19 control measure in countries with differing social contacts, Sci. Rep., 11 (2021), 3354. https://doi.org/10.1038/s41598-021-82873-2 doi: 10.1038/s41598-021-82873-2 |
[4] | State/Local Activity Dashboard, 2021. Available from: https://www.multistate.us/issues/covid-19-policy-tracker |
[5] | V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al., Playing atari with deep reinforcement learning, preprint, arXiv: 1312.5602. |
[6] | R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, in Adaptive computation and machine learning series, MIT Press, Massachusetts, 2018. |
[7] | Y. Liang, M. C. Machado, E. Talvitie, M. Bowling, State of the art control of Atari games using shallow reinforcement learning, preprint, arXiv: 1512.01563. |
[8] | B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, et al., Deep reinforcement learning for autonomous driving: A survey, IEEE Trans. Intell. Transp. Syst., 23 (2022), 4909–4926. https://doi.org/10.1109/TITS.2021.3054625 |
[9] | X. Chen, S. Li, H. Li, S. Jiang, Y. Qi, L. Song, Generative adversarial user model for reinforcement learning based recommendation system, in Proceedings of the 36th International Conference on Machine Learning, 97 (2019), 1052–1061. |
[10] | S. Zhou, X. Liu, Y. Xu, J. Guo, A deep Q-network (DQN) based path planning method for mobile robots, in 2018 IEEE International Conference on Information and Automation (ICIA), (2018), 366–371. https://doi.org/10.1109/ICInfA.2018.8812452 |
[11] | H. Luo, S. W. Li, J. Glass, Prototypical q networks for automatic conversational diagnosis and few-shot new disease adaption, preprint, arXiv: 2005.11153. |
[12] | L. Chen, Q. Gao, Application of deep reinforcement learning on automated stock trading, in 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), (2019), 29–33. https://doi.org/10.1109/ICSESS47205.2019.9040728 |
[13] | C. Colas, B. Hejblum, S. Rouillon, R. Thiébaut, P. Y. Oudeyer, C. Moulin-Frier, et al., EpidemiOptim: A toolbox for the optimization of control policies in epidemiological models, J. Artif. Intell. Res., 71 (2021), 479–519. https://doi.org/10.1613/jair.1.12588 |
[14] | G. H. Kwak, L. Ling, P. Hui, Deep reinforcement learning approaches for global public health strategies for COVID-19 pandemic, PLoS ONE, 16 (2021). https://doi.org/10.1371/journal.pone.0251550 |
[15] | A. Gosavi, Simulation-based optimization: Parametric optimization techniques and reinforcement learning, in Operations Research/Computer Science Interfaces Series, Springer, 2015. |
[16] | H. Khadilkar, T. Ganu, D. P. Seetharam, Optimising lockdown policies for epidemic control using reinforcement learning: An AI-driven control approach compatible with existing disease and network models, Trans. Indian Natl. Acad. Eng., 5 (2020), 129–132. https://doi.org/10.1007/s41403-020-00129-3 doi: 10.1007/s41403-020-00129-3 |
[17] | V. Kompella, R. Capobianco, S. Jong, J. Browne, S. Fox, L. Meyers, et al., Reinforcement learning for optimization of COVID-19 mitigation policies, preprint, arXiv: 2010.10560. |
[18] | M. Arango, L. Pelov, COVID-19 pandemic cyclic lockdown optimization using reinforcement learning, preprint, arXiv: 2009.04647. |
[19] | W. O. Kermack, A. G. McKendrick, A contribution to the mathematical theory of epidemics, Proc. R. Soc. Lond. A, 115 (1927), 700–721. https://doi.org/10.1098/rspa.1927.0118 doi: 10.1098/rspa.1927.0118 |
[20] | L. Miralles-Pechuán, F. Jiménez, H. Ponce, L. Martínez-Villaseñor, A deep Q-learning/genetic algorithms based novel methodology for optimizing COVID-19 pandemic government actions, preprint, arXiv: 2005.07656. |
[21] | T. M. Chen, J. Rui, Q. P. Wang, Z. Y. Zhao, J. A. Cui, L. Yin, A mathematical model for simulating the phase-based transmissibility of a novel coronavirus, Infect. Dis. Poverty, 9 (2020), 24. https://doi.org/10.1186/s40249-020-00640-3 doi: 10.1186/s40249-020-00640-3 |
[22] | Centers for Disease Control and Prevention, COVID Data Tracker, 2021. Available from: https://covid.cdc.gov/covid-data-tracker/#cases_casesper100k |
[23] | S. N. Khatami, C. Gopalappa, A reinforcement learning model to inform optimal decision paths for HIV elimination, Math. Biosci. Eng., 18 (2021), 7666–7684. https://doi.org/10.3934%2Fmbe.2021380 |
[24] | U.S. BUREAU OF LABOR STATISTICS, TED: The Economics Daily image, 2021. Available from: https://www.bls.gov/opub/ted/2021/107-5-million-private-sector-workers-in-pandemic-essential-industries-in-2019.htm |
[25] | A. Hill, Stable Baselines, 2018. Available from: https://github.com/hill-a/stable-baselines |
[26] | Z. Tang, L. Luo, B. Xie, Y. Zhu, R. Zhao, L. Bi, et al., Automatic sparse connectivity learning for neural networks, IEEE Trans. Neural Netw. Learn. Syst., 2022 (2022). https://doi.org/10.1109/TNNLS.2022.3141665 doi: 10.1109/TNNLS.2022.3141665 |
[27] | J. Zheng, C. Lu, C. Hao, D. Chen, D. Guo, Improving the generalization ability of deep neural networks for cross-domain visual recognition, IEEE Trans. Cogn. Dev. Syst., 13 (2021), 607–620. https://doi.org/10.1109/TCDS.2020.2965166 doi: 10.1109/TCDS.2020.2965166 |
[28] | T. Hagendorff, K. Wezel, 15 challenges for AI: or what AI (currently) can't do, AI Soc., 35 (2020), 355–365. https://doi.org/10.1007/s00146-019-00886-y doi: 10.1007/s00146-019-00886-y |
mbe-20-08-640 supplementary.pdf |