Real-World Design and Deployment of an Embedded GenAI-powered 9-1-1 Calltaking Training System: Experiences and Lessons Learned
Zirong Chen, Meiyi Ma
TL;DR
This paper reports a real-world deployment of an embedded GenAI-powered 9-1-1 calltaking training system at MNDEC, addressing the training bottleneck faced by emergency response centers. It combines realistic caller simulation and formalized protocol verification to automate scenario generation and performance assessment across 57 incident types and 100 caller personas, with 1,651 protocol requirements. Over six months, 190 users completed 1,120 sessions, generating rich deployment logs that support four actionable lessons: iterative, co-architected development; hybrid formal methods for safety-critical rigor; triangulated feedback to separate genuine failures from stress effects; and constructive explanations that sustain calibrated difficulty. The work offers concrete guidance for responsibly embedding AI in safety-critical public sector environments where governance, accountability, and organizational dynamics shape adoption as much as algorithmic performance.
Abstract
Emergency call-takers form the first operational link in public safety response, handling over 240 million calls annually while facing a sustained training crisis: staffing shortages exceed 25\% in many centers, and preparing a single new hire can require up to 720 hours of one-on-one instruction that removes experienced personnel from active duty. Traditional training approaches struggle to scale under these constraints, limiting both coverage and feedback timeliness. In partnership with Metro Nashville Department of Emergency Communications (MNDEC), we designed, developed, and deployed a GenAI-powered call-taking training system under real-world constraints. Over six months, deployment scaled from initial pilot to 190 operational users across 1,120 training sessions, exposing systematic challenges around system delivery, rigor, resilience, and human factors that remain largely invisible in controlled or purely simulated evaluations. By analyzing deployment logs capturing 98,429 user interactions, organizational processes, and stakeholder engagement patterns, we distill four key lessons, each coupled with concrete design and governance practices. These lessons provide grounded guidance for researchers and practitioners seeking to embed AI-driven training systems in safety-critical public sector environments where embedded constraints fundamentally shape socio-technical design.
