Table of Contents
Fetching ...

Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition

Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann

TL;DR

This paper addresses accessibility gaps in real time captioning for d/Deaf and hard of hearing individuals by evaluating a semi automated CART workflow in which non professional editors collaboratively correct ASR output as speech unfolds. Using a mixed methods design with a large user study (n=75) and complementary DHH focus groups (n=25), the authors assess feasibility, accuracy, and perceived understandability of the approach. They observe a drop in Word Error Rate from 9.3% to 6.2% after collaborative editing, with DHH readers rating the corrected transcripts as adequately understandable, especially at lower WER levels, while noting the cognitive load on editors and the importance of minimal latency. The findings support a scalable, semi automated CART workflow as a viable intermediate solution that complements ASR and traditional CART, and point to practical design considerations and future AI assistance to help manage editing tasks and improve knowledge transfer.

Abstract

Communication access real-time translation (CART) is an essential accessibility service for d/Deaf and hard of hearing (DHH) individuals, but the cost and scarcity of trained personnel limit its availability. While Automatic Speech Recognition (ASR) offers a cheap and scalable alternative, transcription errors can lead to serious accessibility issues. Real-time correction of ASR by non-professionals presents an under-explored CART workflow that addresses these limitations. We conducted a user study with 75 participants to evaluate the feasibility and efficiency of this workflow. Complementary, we held focus groups with 25 DHH individuals to identify acceptable accuracy levels and factors affecting the accessibility of real-time captioning. Results suggest that collaborative editing can improve transcription accuracy to the extent that DHH users rate it positively regarding understandability. Focus groups also showed that human effort to improve captioning is highly valued, supporting a semi-automated approach as an alternative to stand-alone ASR and traditional CART services.

Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition

TL;DR

This paper addresses accessibility gaps in real time captioning for d/Deaf and hard of hearing individuals by evaluating a semi automated CART workflow in which non professional editors collaboratively correct ASR output as speech unfolds. Using a mixed methods design with a large user study (n=75) and complementary DHH focus groups (n=25), the authors assess feasibility, accuracy, and perceived understandability of the approach. They observe a drop in Word Error Rate from 9.3% to 6.2% after collaborative editing, with DHH readers rating the corrected transcripts as adequately understandable, especially at lower WER levels, while noting the cognitive load on editors and the importance of minimal latency. The findings support a scalable, semi automated CART workflow as a viable intermediate solution that complements ASR and traditional CART, and point to practical design considerations and future AI assistance to help manage editing tasks and improve knowledge transfer.

Abstract

Communication access real-time translation (CART) is an essential accessibility service for d/Deaf and hard of hearing (DHH) individuals, but the cost and scarcity of trained personnel limit its availability. While Automatic Speech Recognition (ASR) offers a cheap and scalable alternative, transcription errors can lead to serious accessibility issues. Real-time correction of ASR by non-professionals presents an under-explored CART workflow that addresses these limitations. We conducted a user study with 75 participants to evaluate the feasibility and efficiency of this workflow. Complementary, we held focus groups with 25 DHH individuals to identify acceptable accuracy levels and factors affecting the accessibility of real-time captioning. Results suggest that collaborative editing can improve transcription accuracy to the extent that DHH users rate it positively regarding understandability. Focus groups also showed that human effort to improve captioning is highly valued, supporting a semi-automated approach as an alternative to stand-alone ASR and traditional CART services.

Paper Structure

This paper contains 27 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Architecture and user interface of the prototype
  • Figure 2: Word Error Rate by editing scenario