Table of Contents
Fetching ...

Speech Recognition for Analysis of Police Radio Communication

Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul

TL;DR

It is found that both human and machine transcription is challenging in this domain, and large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance.

Abstract

Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their naturalistic audio properties make automatic transcription challenging. We collect a corpus of roughly 62,000 manually transcribed radio transmissions (~46 hours of audio) to evaluate the feasibility of automatic speech recognition (ASR) using modern recognition models. We evaluate the performance of off-the-shelf speech recognizers, models fine-tuned on BPC data, and customized end-to-end models. We find that both human and machine transcription is challenging in this domain. Large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance. Our work suggests directions for future work, including analysis of short utterances and potential miscommunication in police radio interactions. We make our corpus and data annotation pipeline available to other researchers, to enable further research on recognition and analysis of police communication.

Speech Recognition for Analysis of Police Radio Communication

TL;DR

It is found that both human and machine transcription is challenging in this domain, and large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance.

Abstract

Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their naturalistic audio properties make automatic transcription challenging. We collect a corpus of roughly 62,000 manually transcribed radio transmissions (~46 hours of audio) to evaluate the feasibility of automatic speech recognition (ASR) using modern recognition models. We evaluate the performance of off-the-shelf speech recognizers, models fine-tuned on BPC data, and customized end-to-end models. We find that both human and machine transcription is challenging in this domain. Large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance. Our work suggests directions for future work, including analysis of short utterances and potential miscommunication in police radio interactions. We make our corpus and data annotation pipeline available to other researchers, to enable further research on recognition and analysis of police communication.
Paper Structure (20 sections, 2 figures, 2 tables)

This paper contains 20 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Duration distribution of utterances in our training set, for utterances of duration up to 10 seconds (96.5% of the train set utterances).
  • Figure 2: WER vs. audio quality (SI-SDR) and utterance duration for NeMo FastConformer CTC (616M) on the dev set.