Table of Contents
Fetching ...

On the Effect of Missing Transmission Chain Information in Agent-Based Models: Outcomes of Superspreading Events and Workplace Transmission

Sascha Korf, Sophia Johanna Wagner, Gerta Köster, Martin J. Kühn

TL;DR

This work tackles how missing transmission chain information in agent-based epidemic modeling biases early outbreak predictions. By coupling the Vadere microsimulation (detailed transmission and aerosol exposure) with the MEmilio ABM (large-scale, location-based spread), the authors compare transmission-informed versus case-number-based initializations across four outbreak scenarios (two restaurant, two workplace). They find substantial, scenario-dependent prediction errors (e.g., up to $46.0\%$ more infections by day $10$ in restaurant R1) driven by clustering versus dispersion in initial infections, with spatial and network structure (transmission trees) revealing the mechanisms behind these biases. The results underscore the need for data-augmented initialization or improved approximations when transmission chains are unavailable, informing when simplified initializations may suffice and when network-aware approaches are essential for reliable policy guidance during health emergencies.

Abstract

Agent-based models (ABMs) have emerged as distinguished tools for epidemic modeling due to their ability to capture detailed human contact patterns. ABMs can support decision-makers in times of outbreaks and epidemics substantially. However, as a result of missing correspondingly resolved data transmission events are often modeled based on simplified assumptions. In this article, we present a framework to assess the impact of these simplifications on epidemic prediction outcomes, considering superspreading and workplace transmission events. We couple the VADERE microsimulation model with the large-scale MEmilio-ABM and compare the outcomes of four outbreak events after 10 days of simulation in a synthetic city district generated from German census data. In a restaurant superspreading event, where up to four households share tables, we observe 17.2~\% more infections on day 10 after the outbreak. The difference increases to 46.0 % more infections when using the simplified initialization in a setting where only two households share tables. We observe similar outcomes (41.3 % vs. 9.3 % more infections) for two workplace settings with different mixing patterns between teams at work. In addition to the aggregated difference, we show differences in spatial dynamics and transmission trees obtained with complete or reduced outbreak information. We observe differences between simplified and fully detailed initializations that become more pronounced when the subnetworks in the outbreak setting are mixing less. In consequence and aside from classical calibration of models, the significant outcome differences should drive us to develop a more profound understanding of how and where simplified assumptions about transmission events are adequate.

On the Effect of Missing Transmission Chain Information in Agent-Based Models: Outcomes of Superspreading Events and Workplace Transmission

TL;DR

This work tackles how missing transmission chain information in agent-based epidemic modeling biases early outbreak predictions. By coupling the Vadere microsimulation (detailed transmission and aerosol exposure) with the MEmilio ABM (large-scale, location-based spread), the authors compare transmission-informed versus case-number-based initializations across four outbreak scenarios (two restaurant, two workplace). They find substantial, scenario-dependent prediction errors (e.g., up to more infections by day in restaurant R1) driven by clustering versus dispersion in initial infections, with spatial and network structure (transmission trees) revealing the mechanisms behind these biases. The results underscore the need for data-augmented initialization or improved approximations when transmission chains are unavailable, informing when simplified initializations may suffice and when network-aware approaches are essential for reliable policy guidance during health emergencies.

Abstract

Agent-based models (ABMs) have emerged as distinguished tools for epidemic modeling due to their ability to capture detailed human contact patterns. ABMs can support decision-makers in times of outbreaks and epidemics substantially. However, as a result of missing correspondingly resolved data transmission events are often modeled based on simplified assumptions. In this article, we present a framework to assess the impact of these simplifications on epidemic prediction outcomes, considering superspreading and workplace transmission events. We couple the VADERE microsimulation model with the large-scale MEmilio-ABM and compare the outcomes of four outbreak events after 10 days of simulation in a synthetic city district generated from German census data. In a restaurant superspreading event, where up to four households share tables, we observe 17.2~\% more infections on day 10 after the outbreak. The difference increases to 46.0 % more infections when using the simplified initialization in a setting where only two households share tables. We observe similar outcomes (41.3 % vs. 9.3 % more infections) for two workplace settings with different mixing patterns between teams at work. In addition to the aggregated difference, we show differences in spatial dynamics and transmission trees obtained with complete or reduced outbreak information. We observe differences between simplified and fully detailed initializations that become more pronounced when the subnetworks in the outbreak setting are mixing less. In consequence and aside from classical calibration of models, the significant outcome differences should drive us to develop a more profound understanding of how and where simplified assumptions about transmission events are adequate.

Paper Structure

This paper contains 30 sections, 20 figures, 3 tables.

Figures (20)

  • Figure 1: Simulating and Comparing the Effect of Detailed Transmission Chain Initialization. The workflow depicts how we analyze the effect of missing transmission chain information in epidemic predictions through three stages. The outbreak simulation (left) provides a realistic data set that serves as ground truth with full information on the initial transmission chains. Using either this full set of information or aggregations thereof to case numbers (center), we initialize the city district ABM, which is then run for 10 days (right). The outbreak event location is depicted below the outbreak simulation and shows agents positioned according to proximity, with one initially infectious agent (in red) spreading the disease to others. The smaller boxes represent a symbolic excerpt of the city district with four locations, such as households with four persons each.
  • Figure 2: Contact Network Analysis for Baseline MEmilio-ABM and Population Structure. The figure presents an overview of a synthetic population comprising $N = 1000$ agents distributed across $H = 500$ households. Data is gathered from a 7-day MEmilio simulation with no infectious agents. Panel A displays the agent-based contact network where individual agents appear as light blue circles with dark blue outlines, organized within household groupings. Contact relationships (amount of hours spent at the same location) are visualized as colored edges connecting agents to location nodes (light blue squares) and back to the contacted agent, with colors corresponding to location types: blue for Work, orange for School, purple for SocialEvent, and yellow for BasicsShop. Edge thickness reflects contact intensity (more hours spent together). Location areas exclude Hospital/ICU types (as no infections occur and severe status is not reached) and household contacts to ensure visual clarity, as contact intensity is always high between household members. Panel B shows population-wide time allocation across all $N = 1000$ individuals. Panel C presents worker-specific patterns ($n_w = 431$ individuals). Panel D displays pupil allocation patterns ($n_s = 105$ individuals). Panel E reveals the demographic age structure, reflecting the German profile. Panel F shows household size distribution across $H = 500$ households: single-person, two-person, three-person, four-person, and five-person households, also reflecting the German profile. Panel G presents occupation distribution: Workers, Pupils, and Others; representing infants, retirees and non-employed agents. Panel H presents the potential contact distribution, which quantifies the number of distinct individuals with whom each individual spends at least one hour. All temporal data represent average daily patterns, with percentages normalized relative to 24-hour totals.
  • Figure 3: Scenario Overview Showing Final Infection Spread Across Different Scenarios and Initial Conditions. Panels show outbreak-related infected agents (red circles for newly infected and black circles for the initially infectious agent). Colored outlines reflect households (red and cyan outlined groups) for the restaurant setting and workplaces (violet outlined office pairs) for the work setting. Blue circles indicate susceptible individuals after the simulation. Initially, infectious agents emit virus particles, visualized as brown circles surrounding them. Panel A: Restaurant outbreak scenarios showing spatial distribution of $n=89$ agents across dining tables with one initially infectious agent. The R1 excerpt shows household grouping at the two selected tables (top inset, red outlined groups) at the start of the simulation, while the R2 excerpt (bottom inset, cyan outlined groups) shows inter-household mixing at the start of the simulation. Households are shown only for the two tables where infections happen. Panel B: Workplace Scenario W1 featuring $n=26$ agents with few meetings and limited mixing. Panel C: Workplace Scenario W2 with identical office structure but more meetings and more mixing.
  • Figure 4: Epidemic Trajectories comparing Transmission-Informed versus Uniform Initialization across four Outbreak Scenarios. Each panel shows cumulative infections over 10 days with 50 % (dark shaded) and 90 % (light shaded) confidence intervals from 100 simulations. Red lines represent transmission-informed initialization preserving ground truth infection data from Vadere's microsimulation, while blue lines show uniform initialization distributing infections randomly across outbreak participants. Vertical dashed lines indicate when each approach reaches 100 infections (UI: uniform, TI: transmission-informed). Final day differences are shown in boxes with absolute case counts and percentage increases.
  • Figure 5: Heatmap of Spatially Distributed Infections for Restaurant Scenario R1 (Limited mixing): We compare transmission-informed initialization (top row) versus uniform initialization (bottom row) across four time points: initialization, day 1, day 3, and day 10. Rectangular shapes represent households, with color intensity indicating the median number of infected members per household across 100 simulation runs (one color per household size, light shading for a low percentage of infected, dark shading for a high percentage). The center panel shows the median number of affected households over time.
  • ...and 15 more figures