Table of Contents
Fetching ...

Mantis: A Simulation-Grounded Foundation Model for Disease Forecasting

Carson Dudley, Reiden Magdaleno, Christopher Harding, Ananya Sharma, Emily Martin, Marisa Eisenberg

TL;DR

Mantis, a foundation model trained entirely on mechanistic simulations, enables out-of-the-box forecasting across diseases, regions, and outcomes, even in settings with limited historical data, and is deployable where traditional models fail.

Abstract

Infectious disease forecasting in novel outbreaks or low-resource settings is hampered by the need for disease-specific data, bespoke training, and expert tuning. We introduce Mantis, a foundation model trained entirely on mechanistic simulations, which enables out-of-the-box forecasting across diseases, regions, and outcomes, even in settings with limited historical data. We evaluated Mantis against 48 forecasting models across six diseases with diverse transmission modes, assessing both point forecast accuracy (mean absolute error) and probabilistic performance (weighted interval score and coverage). Despite using no real-world data during training, Mantis achieved lower mean absolute error than all models in the CDC's COVID-19 Forecast Hub when backtested on early pandemic forecasts. Across all other diseases tested, including respiratory, vector-borne, and waterborne pathogens, Mantis consistently ranked in the top two models across all evaluation metrics. Notably, Mantis generalized to diseases with transmission mechanisms not represented in its training data, demonstrating that it captures fundamental contagion dynamics rather than memorizing disease-specific patterns. These capabilities position Mantis as a practical foundation for disease forecasting: general-purpose, accurate, and deployable where traditional models fail.

Mantis: A Simulation-Grounded Foundation Model for Disease Forecasting

TL;DR

Mantis, a foundation model trained entirely on mechanistic simulations, enables out-of-the-box forecasting across diseases, regions, and outcomes, even in settings with limited historical data, and is deployable where traditional models fail.

Abstract

Infectious disease forecasting in novel outbreaks or low-resource settings is hampered by the need for disease-specific data, bespoke training, and expert tuning. We introduce Mantis, a foundation model trained entirely on mechanistic simulations, which enables out-of-the-box forecasting across diseases, regions, and outcomes, even in settings with limited historical data. We evaluated Mantis against 48 forecasting models across six diseases with diverse transmission modes, assessing both point forecast accuracy (mean absolute error) and probabilistic performance (weighted interval score and coverage). Despite using no real-world data during training, Mantis achieved lower mean absolute error than all models in the CDC's COVID-19 Forecast Hub when backtested on early pandemic forecasts. Across all other diseases tested, including respiratory, vector-borne, and waterborne pathogens, Mantis consistently ranked in the top two models across all evaluation metrics. Notably, Mantis generalized to diseases with transmission mechanisms not represented in its training data, demonstrating that it captures fundamental contagion dynamics rather than memorizing disease-specific patterns. These capabilities position Mantis as a practical foundation for disease forecasting: general-purpose, accurate, and deployable where traditional models fail.

Paper Structure

This paper contains 107 sections, 72 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Conceptual overview of Mantis. Mantis is a simulation-grounded foundation model trained entirely on synthetic outbreaks generated by mechanistic epidemiological models. The training pipeline begins with a modular simulator that encodes diverse outbreak mechanisms, including multiple transmission modes (human-to-human, vectorborne, environmental), progression dynamics, intervention strategies, and population structures. These simulations are then passed through an observation model that incorporates real-world surveillance effects such as underreporting, stochasticity, and reporting delays. Trained on over 400 million simulated days, Mantis forecasts directly from real-world time series at inference time. Key advantages include accurate out-of-the-box performance, long-range forecasting capability, and mechanistic interpretability via back-to-simulation attribution.
  • Figure 2: Covariate integration improves accuracy. Mantis maintains calibrated uncertainty across forecast horizons.(a) Including covariates (e.g., using cases to predict hospitalizations) consistently improves Mantis's accuracy across all forecast horizons. Relative MAE shown for COVID-19 mortality forecasts with (blue) and without (orange) hospitalization covariates across 2, 4, 6, and 8-week horizons. (b) Mantis's prediction interval widths increase appropriately with forecast horizon, reflecting growing uncertainty in longer-range predictions. The 50% confidence interval width (blue) grows from approximately 50 to 80 deaths, while the 95% interval width (orange) expands from approximately 110 to 180 deaths over 1 to 4 weeks ahead when forecasting COVID-19 mortality. This systematic widening demonstrates the model becoming appropriately less confident as prediction horizon increases.
  • Figure 3: Mantis Produces Accurate and Generalizable Forecasts Across Diseases and Geographies.(a) Four-week-ahead forecasts (blue dashed line and shaded 90% CI) compared to observed outcomes (black) for COVID-19 mortality in Minnesota and influenza-like illness (ILI) in Michigan. In the latter, Mantis demonstrates its foundation model capacity by accurately forecasting syndromic inputs despite never being trained with syndromic data. (b) Eight-week-ahead forecasts for four historical outbreaks, highlighting Mantis’s ability to generalize across time, space, and transmission mode. In Alagoas, Brazil, Mantis forecasts the 2004 dengue surge prematurely, highlighting a failure in temporal calibration due to the absence of covariates, but still estimates the peak height with remarkable accuracy despite being 18 weeks early. Forecasts for hepatitis B in Florida---a chronic, bloodborne disease type absent from Mantis’s training set---further demonstrate its capacity to generalize to out-of-distribution disease profiles while maintaining forecast stability and accuracy.
  • Figure 4: Mantis delivers consistent performance across population scales. Relative MAE versus state population for COVID-19 mortality forecasts across 51 U.S. states and territories (Vermont excluded as an outlier). Each point represents the mean relative MAE for a jurisdiction across all forecast dates from April 2020 through November 2021. Population is shown on a logarithmic scale (2020 Census). A weak negative correlation ($R^2 = 0.06$) indicates slightly better performance in larger states, but the low coefficient of determination suggests that Mantis achieves relatively uniform accuracy across diverse population sizes.