Table of Contents
Fetching ...

COVID-19 Forecasting from U.S. Wastewater Surveillance Data: A Retrospective Multi-Model Study (2022-2024)

Faharudeen Alhassan, Hamed Karami, Amanda Bleichrodt, James M. Hyman, Isaac C. H. Fung, Ruiyan Luo, Gerardo Chowell

Abstract

Accurate and reliable forecasting models are critical for guiding public health responses and policy decisions during pandemics such as COVID-19. Retrospective evaluation of model performance is essential for improving epidemic forecasting capabilities. In this study, we used COVID-19 wastewater data from CDC's National Wastewater Surveillance System to generate sequential weekly retrospective forecasts for the United States from March 2022 through September 2024, both at the national level and for four major regions (Northeast, Midwest, South, and West). We produced 133 weekly forecasts using 11 models, including ARIMA, generalized additive models (GAM), simple linear regression (SLR), Prophet, and the n-sub-epidemic framework (top-ranked, weighted-ensemble, and unweighted-ensemble variants). Forecast performance was assessed using mean absolute error (MAE), mean squared error (MSE), weighted interval score (WIS), and 95% prediction interval coverage. The n-sub-epidemic unweighted ensembles outperformed all other models at 3-4-week horizons, particularly at the national level and in the Midwest and West. ARIMA and GAM performed best at 1-2-week horizons in most regions, whereas Prophet and SLR consistently underperformed across regions and horizons. These findings highlight the value of region-specific modeling strategies and demonstrate the utility of the n-sub-epidemic framework for real-time outbreak forecasting using wastewater surveillance data.

COVID-19 Forecasting from U.S. Wastewater Surveillance Data: A Retrospective Multi-Model Study (2022-2024)

Abstract

Accurate and reliable forecasting models are critical for guiding public health responses and policy decisions during pandemics such as COVID-19. Retrospective evaluation of model performance is essential for improving epidemic forecasting capabilities. In this study, we used COVID-19 wastewater data from CDC's National Wastewater Surveillance System to generate sequential weekly retrospective forecasts for the United States from March 2022 through September 2024, both at the national level and for four major regions (Northeast, Midwest, South, and West). We produced 133 weekly forecasts using 11 models, including ARIMA, generalized additive models (GAM), simple linear regression (SLR), Prophet, and the n-sub-epidemic framework (top-ranked, weighted-ensemble, and unweighted-ensemble variants). Forecast performance was assessed using mean absolute error (MAE), mean squared error (MSE), weighted interval score (WIS), and 95% prediction interval coverage. The n-sub-epidemic unweighted ensembles outperformed all other models at 3-4-week horizons, particularly at the national level and in the Midwest and West. ARIMA and GAM performed best at 1-2-week horizons in most regions, whereas Prophet and SLR consistently underperformed across regions and horizons. These findings highlight the value of region-specific modeling strategies and demonstrate the utility of the n-sub-epidemic framework for real-time outbreak forecasting using wastewater surveillance data.

Paper Structure

This paper contains 30 sections, 18 equations, 20 figures.

Figures (20)

  • Figure 1: Time-series comparison of wastewater viral activity level (WVAL) and crude COVID-19 hospitalization rates from January 2022 to September 2024. The top panel illustrates the WVAL detected in wastewater surveillance data, and the bottom panel represents the crude hospitalization rate over the same period. Both datasets exhibit periodic fluctuations, with notable peaks aligning at several time points, suggesting a potential correlation between wastewater viral activity and hospital admissions. The observed trends indicate that increases in wastewater viral concentration often precede corresponding surges in hospitalization rates, highlighting the potential utility of wastewater surveillance as an early warning system for monitoring COVID-19 outbreaks. This relationship emphasizes the importance of integrating wastewater-based epidemiology into public health decision-making. The lead time between WVAL peaks and hospitalization surges appears to vary by region and outbreak intensity, typically ranging from 2-12 days based on recent U.S. surveillance data schenk2024sars
  • Figure 2: Forecasts of WVAL at the national level for an example forecast date (originating on September 7, 2024) across all models. Each panel shows the median prediction (red solid line) and the corresponding 95% prediction interval (gray shaded region bounded by black dashed lines). Black circles denote calibration data, while red circles indicate observations used for evaluation. The vertical dashed line marks the end of the calibration period. This example illustrates the varying widths of prediction intervals across models, with ensemble approaches generally providing wider but more reliable uncertainty bounds than single models such as SLR or Prophet.
  • Figure 3: Forecasts of WVAL for the Midwest region for an example forecast date (originating on September 7, 2024) across all models. Each panel shows the median prediction (red solid line) and the corresponding 95% prediction interval (gray shaded region bounded by black dashed lines). Black circles denote calibration data, while red circles indicate observations used for evaluation; the vertical dashed line marks the end of the calibration period. The GAM model exhibits particularly tight prediction intervals during calibration, making its gray region difficult to distinguish from the median curve. The sub-epidemic models more accurately track the observed declining trend, whereas several statistical models tend to overestimate future WVAL values.
  • Figure 4: Log-transformed averages of MAE, MSE, and WIS, and the (untransformed) average 95% PI coverage across all models for 1--4 week horizons in the National region, spanning the period from March 5, 2022, to September 14, 2024 (133 forecasts). Lighter shades indicate smaller (better) values for MAE, MSE, and WIS, whereas darker shades indicate larger errors. For coverage, values closer to 95% indicate better performance. Black cells highlight the best-performing model(s) for each forecast horizon and metric.
  • Figure 5: Temporal dynamics of forecast performance for EM3 UW model. Top panel: Weekly viral activity levels (WVAL, log$_{10}$ scale) across five U.S. regions from March 2022 to September 2024. Lower panels: Heatmaps showing temporal variation in four performance metrics (MAE, MSE, WIS, and 95% PI coverage) for 4-week ahead forecasts. Darker blue in error metric panels indicates higher errors. The coverage panel uses inverted coloring (darker = higher coverage = better performance). Error magnitudes increase during high WVAL periods, but this is partly expected because both WVAL and errors are on the same log scale; higher baseline values naturally yield larger absolute errors. The key scientific question is whether errors increase disproportionately during surges, indicating true model degradation during critical periods. Future analysis should examine skill scores (errors relative to baseline) across different WVAL regimes to isolate genuine performance degradation from scale-dependent error increases.
  • ...and 15 more figures