Table of Contents
Fetching ...

Bayesian calibration of stochastic agent based model via random forest

Connor Robertson, Cosmin Safta, Nicholson Collier, Jonathan Ozik, Jaideep Ray

TL;DR

The paper tackles the computational burden of calibrating stochastic agent-based models (ABMs) for epidemiology by developing a random-forest surrogate that connects reduced ABM parameters to time-series outputs via PCA. This surrogate enables Bayesian calibration with MCMC (DRAM) by operating on a four-parameter subspace, yielding posterior pushforwards and predictive checks that are comparable to previous IMABC calibration but at far lower cost. The CityCOVID case study demonstrates that 4 parameters capture the dominant influence on hospitalizations and deaths, with PCA explaining >95% of variance and the surrogate achieving median relative errors around a few percent in cross-validation. Overall, the approach offers a practical, scalable pathway to probabilistic ABM calibration, with quantified uncertainty and explicit performance metrics such as CRPS and DIC, while acknowledging residual model-form errors and stochasticity limitations.

Abstract

Agent-based models (ABM) provide an excellent framework for modeling outbreaks and interventions in epidemiology by explicitly accounting for diverse individual interactions and environments. However, these models are usually stochastic and highly parametrized, requiring precise calibration for predictive performance. When considering realistic numbers of agents and properly accounting for stochasticity, this high dimensional calibration can be computationally prohibitive. This paper presents a random forest based surrogate modeling technique to accelerate the evaluation of ABMs and demonstrates its use to calibrate an epidemiological ABM named CityCOVID via Markov chain Monte Carlo (MCMC). The technique is first outlined in the context of CityCOVID's quantities of interest, namely hospitalizations and deaths, by exploring dimensionality reduction via temporal decomposition with principal component analysis (PCA) and via sensitivity analysis. The calibration problem is then presented and samples are generated to best match COVID-19 hospitalization and death numbers in Chicago from March to June in 2020. These results are compared with previous approximate Bayesian calibration (IMABC) results and their predictive performance is analyzed showing improved performance with a reduction in computation.

Bayesian calibration of stochastic agent based model via random forest

TL;DR

The paper tackles the computational burden of calibrating stochastic agent-based models (ABMs) for epidemiology by developing a random-forest surrogate that connects reduced ABM parameters to time-series outputs via PCA. This surrogate enables Bayesian calibration with MCMC (DRAM) by operating on a four-parameter subspace, yielding posterior pushforwards and predictive checks that are comparable to previous IMABC calibration but at far lower cost. The CityCOVID case study demonstrates that 4 parameters capture the dominant influence on hospitalizations and deaths, with PCA explaining >95% of variance and the surrogate achieving median relative errors around a few percent in cross-validation. Overall, the approach offers a practical, scalable pathway to probabilistic ABM calibration, with quantified uncertainty and explicit performance metrics such as CRPS and DIC, while acknowledging residual model-form errors and stochasticity limitations.

Abstract

Agent-based models (ABM) provide an excellent framework for modeling outbreaks and interventions in epidemiology by explicitly accounting for diverse individual interactions and environments. However, these models are usually stochastic and highly parametrized, requiring precise calibration for predictive performance. When considering realistic numbers of agents and properly accounting for stochasticity, this high dimensional calibration can be computationally prohibitive. This paper presents a random forest based surrogate modeling technique to accelerate the evaluation of ABMs and demonstrates its use to calibrate an epidemiological ABM named CityCOVID via Markov chain Monte Carlo (MCMC). The technique is first outlined in the context of CityCOVID's quantities of interest, namely hospitalizations and deaths, by exploring dimensionality reduction via temporal decomposition with principal component analysis (PCA) and via sensitivity analysis. The calibration problem is then presented and samples are generated to best match COVID-19 hospitalization and death numbers in Chicago from March to June in 2020. These results are compared with previous approximate Bayesian calibration (IMABC) results and their predictive performance is analyzed showing improved performance with a reduction in computation.
Paper Structure (13 sections, 8 equations, 9 figures, 6 tables)

This paper contains 13 sections, 8 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Range of hospitalization and death trajectories for the observed data in Chicago from March-June of 2020 (black) and the CityCOVID simulations using parameter values from the 4 dimensional quasi-random hypercube used to train and test the surrogate method (blue). CityCOVID outputs are averaged across random seeds.
  • Figure 2: (a) Scree plot demonstrating approximation power of different numbers of PCA modes with a dotted line at the number of modes used for the surrogate. (b) Median absolute relative error for surrogate reconstructions of CityCOVID hospitalization and death trajectories.
  • Figure 3: Accuracy of the data reconstructed using 4 principal components. The components capture over 95% of the variance of the data.
  • Figure 4: Marginal and pairwise posterior samples from DRAM using the random forest surrogate.
  • Figure 5: Marginal posterior samples computed with sequential, rejection based IMABC in a previous calibration ozik2021population, from the prior, and from DRAM samples using the random forest surrogate.
  • ...and 4 more figures