The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al
Nicolad Garneau, Olivier Bolduc
TL;DR
This study assesses how commercial and open-source French-Canadian ASR systems can support court transcription by benchmarking three models on a curated legal dataset and evaluating using Word Error Rate $WER = (I + D + S)/N$ alongside Sonnex Distance for phonetic accuracy. It finds that AWS Transcribe generally yields the lowest $WER$ across two corpora, with OpenAI Whisper offering strengths in minimizing insertions and Google Cloud's Chirp 2 providing competitive results; all models require careful post-editing for legal-grade transcripts. Beyond performance, the paper analyzes broader implications for court reporters, copyists, litigants, and the justice system, highlighting efficiency gains, potential job displacement, privacy concerns, and the need for domain-specific refinement and governance. The work demonstrates the practical potential of ASR to improve access to justice and reduce transcription costs while underscoring the critical need to address accuracy, security, and workforce evolution in the legal domain.
Abstract
In Quebec and Canadian courts, the transcription of court proceedings is a critical task for appeal purposes and must be certified by an official court reporter. The limited availability of qualified reporters and the high costs associated with manual transcription underscore the need for more efficient solutions. This paper examines the potential of Automatic Speech Recognition (ASR) systems to assist court reporters in transcribing legal proceedings. We benchmark three ASR models, including commercial and open-source options, on their ability to recognize French legal speech using a curated dataset. Our study evaluates the performance of these systems using the Word Error Rate (WER) metric and introduces the Sonnex Distance to account for phonetic accuracy. We also explore the broader implications of ASR adoption on court reporters, copyists, the legal system, and litigants, identifying both positive and negative impacts. The findings suggest that while current ASR systems show promise, they require further refinement to meet the specific needs of the legal domain.
