Spatio-Temporal Attributes of Varicella-Zoster Case Number Trends Assist with Optimizing Machine Learning Predictions
DOI:
https://doi.org/10.47852/bonviewMEDIN32021675Keywords:
weeks-ahead VZV case forecasting, spatio-temporal trend attributes, feature importance analysis, optimized machine learning, epidemiological univariate case-trend analysisAbstract
The varicella-zoster virus (VZV) (chickenpox) is a problematic infectious disease with regular outbreaks occurring seasonally in most countries. Being able to predict with accuracy the expected number of cases in future weeks based on historical case trend information is an important goal both locally and nationally. Space and time-related attributes extracted from the case number trends for the previous 12 weeks of historical VZV cases recorded in Hungary. These attributes are able to generate reliable predictions for expected VZV cases for multiple weeks ahead. Supervised machine learning (SML) combined with feature selection optimizers can identify combinations of the most effective of 15 local trend time-series attributes supported. These features are complemented with an additional 10 regional trend attributes providing the spatial dimension. The most practical combination of influential trend attributes varies depending on the number of weeks ahead being forecast. SML models are developed using weekly VZV case data (2005–2014) for the regions of Hungary focusing on the region of Komarom-Esztergom (Kom) northwest of Budapest. SML predictions for up to 4 weeks ahead are most strongly influenced by the local time-series attributes including moving averages (MAs) and seasonality components from recently past weeks. However, for predictions further forward (up to 13 weeks) the SML models also exploit regional trend attributes related to recent past rate-of-change in VZV case numbers to provide effective predictions. The proposed trend-attribute method provides more accurate case predictions than the commonly used univariate case-forecasting methods relying on MA and autoregressive integrated moving models. The applied method also provides a means of data mining the most influential trend attributes and the time ranges of their effectiveness. The flexibility and transparency of the technique provide a robust method that could be applied for forecasting short-term epidemiological case numbers associated with other infectious diseases.
Received: 3 September 2023 | Revised: 4 October 2023 | Accepted: 19 October 2023
Conflicts of Interest
The author declares that he has no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in UC Irvine Machine Learning Repository at https://doi.org/10.24432/C5103B
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Author
This work is licensed under a Creative Commons Attribution 4.0 International License.