Speech Recognition for Analysis of Police Radio Communication

Tejes Srivastava; Ju-Chieh Chou; Priyank Shroff; Karen Livescu; Christopher Graziul

Speech Recognition for Analysis of Police Radio Communication

Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul

TL;DR

It is found that both human and machine transcription is challenging in this domain, and large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance.

Abstract

Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their naturalistic audio properties make automatic transcription challenging. We collect a corpus of roughly 62,000 manually transcribed radio transmissions (~46 hours of audio) to evaluate the feasibility of automatic speech recognition (ASR) using modern recognition models. We evaluate the performance of off-the-shelf speech recognizers, models fine-tuned on BPC data, and customized end-to-end models. We find that both human and machine transcription is challenging in this domain. Large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance. Our work suggests directions for future work, including analysis of short utterances and potential miscommunication in police radio interactions. We make our corpus and data annotation pipeline available to other researchers, to enable further research on recognition and analysis of police communication.

Speech Recognition for Analysis of Police Radio Communication

TL;DR

Abstract

Paper Structure (20 sections, 2 figures, 2 tables)

This paper contains 20 sections, 2 figures, 2 tables.

Introduction
Background
Domain characteristics of police radio speech
Speech recognition for naturalistic audio
Data
Data collection and annotation process
Post-processing steps
Inter-annotator agreement
Experimental setup
Off-the-shelf ASR models
Fine-tuned ASR models
Customized E2E ASR models
Results and analysis
ASR performance
Error analysis
...and 5 more sections

Figures (2)

Figure 1: Duration distribution of utterances in our training set, for utterances of duration up to 10 seconds (96.5% of the train set utterances).
Figure 2: WER vs. audio quality (SI-SDR) and utterance duration for NeMo FastConformer CTC (616M) on the dev set.

Speech Recognition for Analysis of Police Radio Communication

TL;DR

Abstract

Speech Recognition for Analysis of Police Radio Communication

Authors

TL;DR

Abstract

Table of Contents

Figures (2)