Spatio-Temporal Attributes of Varicella Zoster Case Number Trends Assist with Optimizing Machine Learning Predictions
Keywords:weeks-ahead VZV case forecasting, spatio-temporal trend attributes, feature importance analysis, optimized machine learning, epidemiological univariate case-trend analysis
The Varicella zoster (chickenpox) virus (VZV) is a problematic infectious disease with regular outbreaks occurring seasonally in most countries. Being able to predict with accuracy the expected number of cases in future weeks based on historical case trend information is an important goal both locally and nationally. Space and time-related attributes extracted from the case number trends for the previous twelve weeks of historical VZV cases recorded in Hungary. These attributes are able to generate reliable predictions for expected VZV cases for multiple weeks ahead. Supervised machine learning (SML) combined with feature-selection optimizers can identify combinations of the most effective of fifteen local trend time-series attributes supported. These features are complemented with an additional ten regional trend attributes providing the spatial dimension. The most practical combination of influential trend attributes varies depending on the number of weeks ahead being forecast. SML models are developed using weekly-VZV-case data (2005-2014) for the regions of Hungary focusing on the region of Komarom-Esztergom (Kom) northwest of Budapest. SML predictions for up to four weeks ahead are most strongly influenced by the local time-series attributes including moving averages and seasonality components from recently past weeks. However, for predictions further forward (up to thirteen weeks) the SML models also exploit regional trend attributes related to recent past rate-of-change in VZV case numbers to provide effective predictions. The proposed trend-attribute method provides more accurate case predictions than the commonly used univariate case-forecasting methods relying on moving-average and ARIMA models. The applied method also provides a means of data mining the most influential trend attributes and the time ranges of their effectiveness. The flexibility and transparency of the technique provide a robust method that could be applied for forecasting short-term epidemiological case numbers associated with other infectious diseases.
How to Cite
Copyright (c) 2023 Author
This work is licensed under a Creative Commons Attribution 4.0 International License.