Optimal Policy Strategy for Pandemic Outbreak Control: A Deep Reinforcement Approach

Authors

  • Raphael Ibraimoh School of Science, Engineering and Environment, University of Salford, UK
  • Mohammed Saraee School of Science, Engineering and Environment, University of Salford and Data Science and AI Hub, University of Salford, UK
  • Kaveh Kiani School of Science, Engineering and Environment, University of Salford and Data Science and AI Hub, University of Salford, UK
  • Danial Saraee Hall University Teaching Hospitals, NHS Trust, UK https://orcid.org/0000-0003-1831-4297

DOI:

https://doi.org/10.47852/bonviewAIA52026822

Keywords:

COVID-19, deep reinforcement learning, discrete action space, dueling Q-network, lockdown, travel restrictions

Abstract

Since the global spread of COVID-19 pandemic in January 2020, residents in the United Kingdom (UK) have altered their daily routines due to the transmissibility of the virus. Sanitisation, quarantine, contact tracing, mass testing, and vaccination are implemented, affecting virus control, quality of life, resources, and economic development. From January 2020 to January 2021, data from repositories from the Office for National Statistics, NHS England, and the WHO provided statistics on confirmed cases, recoveries, and mortality. Wikipedia and Our World In Data provided the UK lockdown and travel restriction timelines. Deep reinforcement learning, a Dueling Q-learning algorithm, and a well-defined reward function determined the optimal lockdown and travel restriction timings. Initially, our agent (model) suggested strict lockdown and travel restrictions. By mid-March, advisories decreased significantly. In late March, key public health initiatives were introduced. Over the initial three months, the recommendations of our agent had gained support, which proposed slightly smaller lockdown measures than the public health policy but stricter travel restrictions. Our agent advised lockdown and travel limitations, generally suggesting measures before public health authorities or the government approved them. Our agent recommended implementing policies in late January, while authorities delayed until late March. Furthermore, our agent (model) advised against postponing UK policy implementation.

 

Received: 15 July 2025 | Revised: 4 November 2025 | Accepted: 27 November 2025

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

 

Data Availability Statement

The data that support the findings of this study are openly available in Johns Hopkins COVID-19 Data Repository at https://github.com/CSSEGISandData/COVID-19, World Health Organization at https://www.who.int/emergencies/diseases/novel-coronavirus-2019, Our World In Data COVID-19 pandemic data at https://ourworldindata.org/coronavirus, and Wikipedia at https://en.wikipedia.org/wiki/2019%E2%80%9320_coronavirus_pandemic.

 

Author Contribution Statement

Raphael Ibraimoh: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Visualization, Project administration. Mohammed Saraee: Validation, Writing – review & editing, Supervision, Project administration. Kaveh Kiani: Writing – review & editing, Supervision, Project administration. Danial Saraee: Writing – review & editing.


Metrics

Metrics Loading ...

Downloads

Published

2025-12-30

Issue

Section

Research Article

How to Cite

Ibraimoh, R., Saraee, M., Kiani, K., & Saraee, D. (2025). Optimal Policy Strategy for Pandemic Outbreak Control: A Deep Reinforcement Approach. Artificial Intelligence and Applications. https://doi.org/10.47852/bonviewAIA52026822