Pandemics In Silico: Scaling an Agent-Based Simulation on Realistic Social Contact Networks
Joy Kitson, Ian Costello, Jiangzhuo Chen, Diego Jiménez, Stefan Hoops, Henning Mortveit, Esteban Meneses, Jae-Seung Yeom, Madhav V. Marathe, Abhinav Bhatele
TL;DR
Loimos addresses the need for fast, scalable agent-based epidemic simulations on realistic social contact networks. It introduces a hybrid discrete-event/time-stepping framework built on Charm++ to model contagion across a population–location graph, with modular disease, intervention, and performance-optimization capabilities. The study validates Loimos against EpiHiper and demonstrates strong and weak scaling on the Perlmutter supercomputer, achieving rapid, large-scale simulations (e.g., 200 days in ~42 seconds on 4096 cores) and identifying optimizations that substantially reduce runtime. The work has practical impact for policy analysis by enabling rapid exploration of intervention scenarios at national to regional scales on HPC resources.
Abstract
Preventing the spread of infectious diseases requires implementing interventions at various levels of government and evaluating the potential impact and efficacy of those preemptive measures. Agent-based modeling can be used for detailed studies of epidemic diffusion and possible interventions. Modeling of epidemic diffusion in large social contact networks requires the use of parallel algorithms and resources. In this work, we present Loimos, a scalable parallel framework for simulating epidemic diffusion. Loimos uses a hybrid of time-stepping and discrete-event simulation to model disease spread, and is implemented on top of an asynchronous, many-task runtime. We demonstrate that Loimos is to able to achieve significant speedups while scaling to large core counts. In particular, Loimos is able to simulate 200 days of a COVID-19 outbreak on a digital twin of California in about 42 seconds, for an average of 4.6 billion traversed edges per second (TEPS), using 4096 cores on Perlmutter at NERSC.
