Table of Contents
Fetching ...

Early warning of Mpox outbreaks in U.S. jurisdictions using Lasso Vector Autoregression models with cross-jurisdictional lags

Hannah Craddock, Joel O. Wertheim, Eliah Aronoff-Spencer, Mark Beatty, David Valentine, Rishi Graham, Jade C. Wang, Lior Rennert, Seema Shah, Ravi Goyal, Natasha K. Martin

TL;DR

Mpox exhibits episodic, spatially heterogeneous transmission, motivating area-specific forecasts. The authors deploy a sparse VAR framework with cross-jurisdictional lags (VAR-Lasso) to generate rolling two-week-ahead forecasts for eight high-incidence U.S. jurisdictions and identify influential long-lag predictors. External phylogenetic validation in San Diego County aligns the model's cross-jurisdictional signals with observed genetic introductions, and slope-weighted evaluation shows VAR-Lasso consistently outperforms univariate AR-Lasso and naive benchmarks. This approach enables earlier warnings and targeted public health actions by leveraging inter-jurisdictional case dynamics alongside genomic evidence.

Abstract

Mpox is an orthopoxvirus that infects humans and animals and is transmitted primarily through close physical contact. The episodic and spatially heterogeneous dynamics of Mpox transmission underscores the need for timely, area-specific forecasts to support targeted public health responses in the U.S. We develop a Vector Autoregression model with Lasso regularization (VAR-Lasso) to generate rolling two-week-ahead forecasts of weekly Mpox cases for eight high-incidence U.S. jurisdictions using national surveillance data from the Centers for Disease Control and Prevention (CDC). The VAR-Lasso model identifies significant long-lag, cross-jurisdictional predictors. For a case study in San Diego County (SDC), these statistical predictors align with phylogenetic analysis that traces a 2023 cluster in SDC to an outbreak in Illinois six months earlier. As the need for public health action is often greatest when incidence is increasing, our performance evaluation focuses on positive-slope weighted error metrics. Forecast performance of the VAR-Lasso model is compared to a uni-variate Auto-Regressive (AR) Lasso model and a naive moving-average estimate. The models are compared using slope-weighted Root Mean Squared Error (RMSE), slope-weighted Mean Absolute Error (MAE), and slope-weighted bias. Across all observations, the VAR-Lasso model reduces slope-weighted RMSE, MAE, and bias by 12%, 7%, and 66% relative to the AR model, and by 16%, 13%, and 76% relative to the naive benchmark. Our findings highlight the value of sparse multivariate time-series models that leverage cross-jurisdictional case data for early forecasting of Mpox outbreaks. Such forecasting can aid health departments in proactively providing timely resources and messaging to mitigate the risks of a future outbreak.

Early warning of Mpox outbreaks in U.S. jurisdictions using Lasso Vector Autoregression models with cross-jurisdictional lags

TL;DR

Mpox exhibits episodic, spatially heterogeneous transmission, motivating area-specific forecasts. The authors deploy a sparse VAR framework with cross-jurisdictional lags (VAR-Lasso) to generate rolling two-week-ahead forecasts for eight high-incidence U.S. jurisdictions and identify influential long-lag predictors. External phylogenetic validation in San Diego County aligns the model's cross-jurisdictional signals with observed genetic introductions, and slope-weighted evaluation shows VAR-Lasso consistently outperforms univariate AR-Lasso and naive benchmarks. This approach enables earlier warnings and targeted public health actions by leveraging inter-jurisdictional case dynamics alongside genomic evidence.

Abstract

Mpox is an orthopoxvirus that infects humans and animals and is transmitted primarily through close physical contact. The episodic and spatially heterogeneous dynamics of Mpox transmission underscores the need for timely, area-specific forecasts to support targeted public health responses in the U.S. We develop a Vector Autoregression model with Lasso regularization (VAR-Lasso) to generate rolling two-week-ahead forecasts of weekly Mpox cases for eight high-incidence U.S. jurisdictions using national surveillance data from the Centers for Disease Control and Prevention (CDC). The VAR-Lasso model identifies significant long-lag, cross-jurisdictional predictors. For a case study in San Diego County (SDC), these statistical predictors align with phylogenetic analysis that traces a 2023 cluster in SDC to an outbreak in Illinois six months earlier. As the need for public health action is often greatest when incidence is increasing, our performance evaluation focuses on positive-slope weighted error metrics. Forecast performance of the VAR-Lasso model is compared to a uni-variate Auto-Regressive (AR) Lasso model and a naive moving-average estimate. The models are compared using slope-weighted Root Mean Squared Error (RMSE), slope-weighted Mean Absolute Error (MAE), and slope-weighted bias. Across all observations, the VAR-Lasso model reduces slope-weighted RMSE, MAE, and bias by 12%, 7%, and 66% relative to the AR model, and by 16%, 13%, and 76% relative to the naive benchmark. Our findings highlight the value of sparse multivariate time-series models that leverage cross-jurisdictional case data for early forecasting of Mpox outbreaks. Such forecasting can aid health departments in proactively providing timely resources and messaging to mitigate the risks of a future outbreak.
Paper Structure (19 sections, 9 equations, 8 figures, 4 tables)

This paper contains 19 sections, 9 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Weekly reported Mpox cases for the 8 high-incidence U.S. jurisdictions from January 2023 - November 2024. Obtained from the US Centers for Disease Control and Prevention (CDC) National Surveillance System.
  • Figure 2: Percentage improvements of the VAR-Lasso model predictions compared to the AR-Lasso model predictions and the naïve moving-average estimate predictions based on slope-weighted metrics. Slope-weighted RMSE, MAE, and bias are calculated for forecasts across all weekly predictions for the top eight high-incidence jurisdictions over 42 weeks from January to November 2024. Shown are the percentage improvements of the VAR-Lasso model predictions relative to the AR-Lasso predictions (% Improve VAR-AR) and relative to the Naïve predictions (% Improve VAR-Naive).
  • Figure 3: Weekly reported Mpox cases and forecasts for San Diego County, January–November 2024. Reported cases (black), VAR-Lasso forecasts (blue), AR-Lasso forecasts (magenta) and the naïve moving-average forecasts (green).
  • Figure 4: The top five VAR Lasso model coefficients for San Diego County (SDC) and the weekly case counts for SDC and Illinois from 2023-24. (a). Coefficients represent the mean magnitudes across all VAR–Lasso model fits used to forecast the 42 weeks in 2024. Each model is fit using all data from 2023 and data available up to two weeks prior to the forecasted week. The coefficients with the greatest magnitude correspond to Illinois at 23–24-week lags, followed by Los Angeles County (12-week lag), New York City (13-week lag), and Washington state (3-week lag). (b). Reported weekly Mpox cases for SDC and Illinois (2023–2024), showing Illinois peaks leading SDC by approximately 23 weeks, in agreement with phylogenetic evidence of introductions from Illinois.
  • Figure 5: Maximum clade credibility tree depicting migration of Mpox virus into San Diego County in late 2023. Branches are colored based on inferred geographic location. Size of circles on internal nodes represent posterior support, and color represents geographic location. Insert denotes the posterior probability for the location of the branch immediately preceding the ancestor of the SDC MPXV genomes.
  • ...and 3 more figures