Real-World Design and Deployment of an Embedded GenAI-powered 9-1-1 Calltaking Training System: Experiences and Lessons Learned

Zirong Chen; Meiyi Ma

Real-World Design and Deployment of an Embedded GenAI-powered 9-1-1 Calltaking Training System: Experiences and Lessons Learned

Zirong Chen, Meiyi Ma

TL;DR

This paper reports a real-world deployment of an embedded GenAI-powered 9-1-1 calltaking training system at MNDEC, addressing the training bottleneck faced by emergency response centers. It combines realistic caller simulation and formalized protocol verification to automate scenario generation and performance assessment across 57 incident types and 100 caller personas, with 1,651 protocol requirements. Over six months, 190 users completed 1,120 sessions, generating rich deployment logs that support four actionable lessons: iterative, co-architected development; hybrid formal methods for safety-critical rigor; triangulated feedback to separate genuine failures from stress effects; and constructive explanations that sustain calibrated difficulty. The work offers concrete guidance for responsibly embedding AI in safety-critical public sector environments where governance, accountability, and organizational dynamics shape adoption as much as algorithmic performance.

Abstract

Emergency call-takers form the first operational link in public safety response, handling over 240 million calls annually while facing a sustained training crisis: staffing shortages exceed 25\% in many centers, and preparing a single new hire can require up to 720 hours of one-on-one instruction that removes experienced personnel from active duty. Traditional training approaches struggle to scale under these constraints, limiting both coverage and feedback timeliness. In partnership with Metro Nashville Department of Emergency Communications (MNDEC), we designed, developed, and deployed a GenAI-powered call-taking training system under real-world constraints. Over six months, deployment scaled from initial pilot to 190 operational users across 1,120 training sessions, exposing systematic challenges around system delivery, rigor, resilience, and human factors that remain largely invisible in controlled or purely simulated evaluations. By analyzing deployment logs capturing 98,429 user interactions, organizational processes, and stakeholder engagement patterns, we distill four key lessons, each coupled with concrete design and governance practices. These lessons provide grounded guidance for researchers and practitioners seeking to embed AI-driven training systems in safety-critical public sector environments where embedded constraints fundamentally shape socio-technical design.

Real-World Design and Deployment of an Embedded GenAI-powered 9-1-1 Calltaking Training System: Experiences and Lessons Learned

TL;DR

Abstract

Paper Structure (24 sections, 1 equation, 6 figures)

This paper contains 24 sections, 1 equation, 6 figures.

Introduction
Operational Context
System Overview
Iterative Design, Develop, and Deploy with Human in the Loop
Addressing Potential Ethical Concerns
Experiences & Lessons Learned
Bridging Knowledge Gaps Through Iterative Delivery
Observations
Approach and Outcomes
Actionable Practices
Enhancing Rigor with Formal Methods
Observations
Approach and Outcomes
Actionable Practice
Building Resilience via Triangulated Feedback
...and 9 more sections

Figures (6)

Figure 1: Workstation view of the deployed training system at a municipal 9-1-1 communications center.
Figure 2: System workflow showing training assignment, cloud-based caller simulation, real-time performance monitoring, automated debriefing, quality assurance review, and curriculum synchronization.
Figure 3: Continuously iterative design-develop-deploy workflow. The three phases operate concurrently with all around stakeholder participation throughout.
Figure 4: Examples of runtime checks encoded with LLM-backed DETECT predicate. Last six examples are conditioned on the context. All $\tau$ are adaptable hyper-parameters for different call-taking requirements.
Figure 5: Dispute rates (noted as 'phantom error') of user attributed mistakes to system different across experience and performance levels (114 trainees, 1,120 completed sessions).
...and 1 more figures

Real-World Design and Deployment of an Embedded GenAI-powered 9-1-1 Calltaking Training System: Experiences and Lessons Learned

TL;DR

Abstract

Real-World Design and Deployment of an Embedded GenAI-powered 9-1-1 Calltaking Training System: Experiences and Lessons Learned

Authors

TL;DR

Abstract

Table of Contents

Figures (6)