Table of Contents
Fetching ...

Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services

Danush Venkateshperumal, Rahman Abdul Rafi, Shakil Ahmed, Ashfaq Khokhar

TL;DR

The paper tackles the challenge of unreliable VoIP communications in emergency services by deploying an LLM-driven system that reconstructs fragmented speech, fills contextual gaps, and prioritizes calls using a Retrieval-Augmented Generation framework. It integrates real-time transcription via Twilio Media Stream and AssemblyAI, with a RAG-based response generator and a rule-based severity classifier augmented by a DistilBERT emotion model. Key contributions include handling fragmented speech, contextual gap filling, dynamic prioritization, RAG-driven predictive responses, and seamless API integrations for end-to-end operation. The findings indicate the approach can improve dispatch accuracy and reduce response delays in real-world emergencies, though limitations remain under severe network degradation and language coverage; future work suggests multilingual expansion and deeper field testing with emergency services.

Abstract

Emergency communication systems face disruptions due to packet loss, bandwidth constraints, poor signal quality, delays, and jitter in VoIP systems, leading to degraded real-time service quality. Victims in distress often struggle to convey critical information due to panic, speech disorders, and background noise, further complicating dispatchers' ability to assess situations accurately. Staffing shortages in emergency centers exacerbate delays in coordination and assistance. This paper proposes leveraging Large Language Models (LLMs) to address these challenges by reconstructing incomplete speech, filling contextual gaps, and prioritizing calls based on severity. The system integrates real-time transcription with Retrieval-Augmented Generation (RAG) to generate contextual responses, using Twilio and AssemblyAI APIs for seamless implementation. Evaluation shows high precision, favorable BLEU and ROUGE scores, and alignment with real-world needs, demonstrating the model's potential to optimize emergency response workflows and prioritize critical cases effectively.

Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services

TL;DR

The paper tackles the challenge of unreliable VoIP communications in emergency services by deploying an LLM-driven system that reconstructs fragmented speech, fills contextual gaps, and prioritizes calls using a Retrieval-Augmented Generation framework. It integrates real-time transcription via Twilio Media Stream and AssemblyAI, with a RAG-based response generator and a rule-based severity classifier augmented by a DistilBERT emotion model. Key contributions include handling fragmented speech, contextual gap filling, dynamic prioritization, RAG-driven predictive responses, and seamless API integrations for end-to-end operation. The findings indicate the approach can improve dispatch accuracy and reduce response delays in real-world emergencies, though limitations remain under severe network degradation and language coverage; future work suggests multilingual expansion and deeper field testing with emergency services.

Abstract

Emergency communication systems face disruptions due to packet loss, bandwidth constraints, poor signal quality, delays, and jitter in VoIP systems, leading to degraded real-time service quality. Victims in distress often struggle to convey critical information due to panic, speech disorders, and background noise, further complicating dispatchers' ability to assess situations accurately. Staffing shortages in emergency centers exacerbate delays in coordination and assistance. This paper proposes leveraging Large Language Models (LLMs) to address these challenges by reconstructing incomplete speech, filling contextual gaps, and prioritizing calls based on severity. The system integrates real-time transcription with Retrieval-Augmented Generation (RAG) to generate contextual responses, using Twilio and AssemblyAI APIs for seamless implementation. Evaluation shows high precision, favorable BLEU and ROUGE scores, and alignment with real-world needs, demonstrating the model's potential to optimize emergency response workflows and prioritize critical cases effectively.

Paper Structure

This paper contains 30 sections, 19 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Potential Use Cases
  • Figure 2: Problem statement scenario
  • Figure 3: Data description and preprocessing steps
  • Figure 4: System Design
  • Figure 5: Scores from various emergency scenarios
  • ...and 3 more figures