Table of Contents
Fetching ...

Lost in Transcription: Subtitle Errors in Automatic Speech Recognition Reduce Speaker and Content Evaluations

Kowe Kadoma, Priyal Shrivastava, Mor Naaman

Abstract

Researchers have demonstrated that Automatic Speech Recognition (ASR) systems perform differently across demographic groups. In this work, we examined how subtitle errors affect evaluations of speakers and their content using a preregistered online experiment (N=207, U.S.-based crowdworkers). Participants watched speakers with various accents deliver a talk in which the subtitles were accurate or error-prone. Our results indicate that error-prone subtitles consistently reduce both speaker and content evaluations for all speakers. We did not see disparate impact between the accent groups, controlling for subtitle quality. Taken together, though, the findings of this short paper imply that speakers with accents for which ASR systems perform poorly are likely to be further penalized by viewers with lower evaluations.

Lost in Transcription: Subtitle Errors in Automatic Speech Recognition Reduce Speaker and Content Evaluations

Abstract

Researchers have demonstrated that Automatic Speech Recognition (ASR) systems perform differently across demographic groups. In this work, we examined how subtitle errors affect evaluations of speakers and their content using a preregistered online experiment (N=207, U.S.-based crowdworkers). Participants watched speakers with various accents deliver a talk in which the subtitles were accurate or error-prone. Our results indicate that error-prone subtitles consistently reduce both speaker and content evaluations for all speakers. We did not see disparate impact between the accent groups, controlling for subtitle quality. Taken together, though, the findings of this short paper imply that speakers with accents for which ASR systems perform poorly are likely to be further penalized by viewers with lower evaluations.
Paper Structure (14 sections, 4 figures, 3 tables)

This paper contains 14 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: An overview of the experimental procedure.
  • Figure 2: The impact of subtitle quality on the speaker and content evaluations.
  • Figure 3: Audio Analysis. All of the voices sounded natural, and there were subtle differences between the accent groups.
  • Figure 4: Audio Video Synchronization. All of the videos had a medium synchronization.