Table of Contents
Fetching ...

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

Pan-Pan Jiang, Jimmy Tobin, Katrin Tomanek, Robert L. MacDonald, Katie Seaver, Richard Cave, Marilyn Ladewig, Rus Heywood, Jordan R. Green

TL;DR

The project's latest advancements in data collection and annotation methodologies are described, such as expanding speaker diversity in the database, adding human-reviewed transcript corrections and audio quality tags to 350K audio recordings, and amassing a comprehensive set of metadata for over 75\% of the speakers in the database.

Abstract

Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective of the project is to create a large, high-quality, and diverse speech corpus. This report describes the project's latest advancements in data collection and annotation methodologies, such as expanding speaker diversity in the database, adding human-reviewed transcript corrections and audio quality tags to 350K (of the 1.2M total) audio recordings, and amassing a comprehensive set of metadata (including more than 40 speech characteristic labels) for over 75\% of the speakers in the database. We report on the impact of transcript corrections on our machine-learning (ML) research, inter-rater variability of assessments of disordered speech patterns, and our rationale for gathering speech metadata. We also consider the limitations of using automated off-the-shelf annotation methods for assessing disordered speech.

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

TL;DR

The project's latest advancements in data collection and annotation methodologies are described, such as expanding speaker diversity in the database, adding human-reviewed transcript corrections and audio quality tags to 350K audio recordings, and amassing a comprehensive set of metadata for over 75\% of the speakers in the database.

Abstract

Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective of the project is to create a large, high-quality, and diverse speech corpus. This report describes the project's latest advancements in data collection and annotation methodologies, such as expanding speaker diversity in the database, adding human-reviewed transcript corrections and audio quality tags to 350K (of the 1.2M total) audio recordings, and amassing a comprehensive set of metadata (including more than 40 speech characteristic labels) for over 75\% of the speakers in the database. We report on the impact of transcript corrections on our machine-learning (ML) research, inter-rater variability of assessments of disordered speech patterns, and our rationale for gathering speech metadata. We also consider the limitations of using automated off-the-shelf annotation methods for assessing disordered speech.
Paper Structure (12 sections, 2 figures)

This paper contains 12 sections, 2 figures.

Figures (2)

  • Figure 1: Analyzed VAD negative decisions grouped by etiology. False omission rate is labeled for each etiology. (Top) utterance counts, (bottom) speaker counts.
  • Figure 2: Estimates of inter-rater reliability for nine of the disordered speech labels.