What Makes LLM Agent Simulations Useful for Policy Practice? An Iterative Design Study in Emergency Preparedness
Yuxuan Li, Sauvik Das, Hirokazu Shirado
TL;DR
The paper addresses how LLM agent simulations can be meaningfully integrated into policy practice under deep uncertainty, using a year-long, stakeholder-engaged design study in emergency preparedness. It adopts an iterative, co-design approach with a university team, culminating in a stadium-scale simulation of ~13,000 agents that informs training, evacuation procedures, and infrastructure planning rather than predicting exact outcomes. The authors identify five design mechanisms—validation, trust bootstrapping, surface-of- tacit knowledge via fix-it responses, attention to contextual details, and policy–AI co-evolution—that explain how usefulness emerges in real-world policy contexts. The work argues for moving beyond raw model fidelity toward institutional alignment, showcasing how simulations can serve as technology probes that support practical sensemaking, planning, and iterative policy refinement with stakeholders.
Abstract
Policymakers must often act under conditions of deep uncertainty, such as emergency response, where predicting the specific impacts of a policy apriori is implausible. Large Language Model (LLM) agent simulations have been proposed as tools to support policymakers under these conditions, yet little is known about how such simulations become useful for real-world policy practice. To address this gap, we conducted a year-long, stakeholder-engaged design process with a university emergency preparedness team. Through iterative design cycles, we developed and refined an LLM agent simulation of a large-scale campus gathering, ultimately scaling to 13,000 agents that modeled crowd movement and communication under various emergency scenarios. Rather than producing predictive forecasts, these simulations supported policy practice by shaping volunteer training, evacuation procedures, and infrastructure planning. Analyzing these findings, we identify three design process implications for making LLM agent simulations that are useful for policy practice: start from verifiable scenarios to bootstrap trust, use preliminary simulations to elicit tacit domain knowledge, and treat simulation capabilities and policy implementation as co-evolving.
