Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Hongfei Xue, Rong Gong, Mingchen Shao, Xin Xu, Lezhi Wang, Lei Xie, Hui Bu, Jiaming Zhou, Yong Qin, Jun Du, Ming Li, Binbin Zhang, Bin Jia
TL;DR
The paper reports the StutteringSpeech Challenge, the first Mandarin-focused effort to jointly advance Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) for people who stutter. It leverages the AS-70 Mandarin stuttering dataset, repartitioned to avoid overlap in command text, and evaluates SED with multi-label F1 metrics and ASR with CER, across three tracks including an open Research track. Baselines (Conformer for SED and U2++ for ASR) establish reproducible benchmarks, while top submissions demonstrate that targeted data augmentation and advanced architectures (e.g., Zipformer, E-Branchformer, Branchformer, BiLSTM augmentations) substantially improve detection and transcription accuracy. The results highlight the potential of tailored models and augmentation strategies to enable more inclusive Mandarin stuttering technologies, with practical implications for early intervention and everyday speech interfaces for PWS.
Abstract
The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin. The challenge comprises three tracks: (1) SED, which aims to develop systems for detection of stuttering events; (2) ASR, which focuses on creating robust systems for recognizing stuttered speech; and (3) Research track for innovative approaches utilizing the provided dataset. We utilizes an open-source Mandarin stuttering dataset AS-70, which has been split into new training and test sets for the challenge. This paper presents the dataset, details the challenge tracks, and analyzes the performance of the top systems, highlighting improvements in detection accuracy and reductions in recognition error rates. Our findings underscore the potential of specialized models and augmentation strategies in developing stuttered speech technologies.
