Bayesian calibration of stochastic agent based model via random forest
Connor Robertson, Cosmin Safta, Nicholson Collier, Jonathan Ozik, Jaideep Ray
TL;DR
The paper tackles the computational burden of calibrating stochastic agent-based models (ABMs) for epidemiology by developing a random-forest surrogate that connects reduced ABM parameters to time-series outputs via PCA. This surrogate enables Bayesian calibration with MCMC (DRAM) by operating on a four-parameter subspace, yielding posterior pushforwards and predictive checks that are comparable to previous IMABC calibration but at far lower cost. The CityCOVID case study demonstrates that 4 parameters capture the dominant influence on hospitalizations and deaths, with PCA explaining >95% of variance and the surrogate achieving median relative errors around a few percent in cross-validation. Overall, the approach offers a practical, scalable pathway to probabilistic ABM calibration, with quantified uncertainty and explicit performance metrics such as CRPS and DIC, while acknowledging residual model-form errors and stochasticity limitations.
Abstract
Agent-based models (ABM) provide an excellent framework for modeling outbreaks and interventions in epidemiology by explicitly accounting for diverse individual interactions and environments. However, these models are usually stochastic and highly parametrized, requiring precise calibration for predictive performance. When considering realistic numbers of agents and properly accounting for stochasticity, this high dimensional calibration can be computationally prohibitive. This paper presents a random forest based surrogate modeling technique to accelerate the evaluation of ABMs and demonstrates its use to calibrate an epidemiological ABM named CityCOVID via Markov chain Monte Carlo (MCMC). The technique is first outlined in the context of CityCOVID's quantities of interest, namely hospitalizations and deaths, by exploring dimensionality reduction via temporal decomposition with principal component analysis (PCA) and via sensitivity analysis. The calibration problem is then presented and samples are generated to best match COVID-19 hospitalization and death numbers in Chicago from March to June in 2020. These results are compared with previous approximate Bayesian calibration (IMABC) results and their predictive performance is analyzed showing improved performance with a reduction in computation.
