CDSD: Chinese Dysarthria Speech Database
Yan Wang, Mengyi Sun, Xinchen Kang, Jingting Li, Pengfei Guo, Ming Gao, Su-Jing Wang
TL;DR
The paper introduces the Chinese Dysarthria Speech Database (CDSD), the largest Chinese dysarthria dataset to date, addressing the critical data bottleneck in DSR for Chinese with 133 hours of recordings from 44 speakers. It constructs CDSD with two parts (A and B) and provides detailed data collection and annotation procedures, including audio-visual data for a subset and standardized transcription. Using an end-to-end ESPnet Conformer ASR, with Fbank and Wav2vec features and multi-stage pre-training on AISHELL-1, AISHELL-2, and WenetSpeech, the study reports a best CER of 16.4% on Part A and demonstrates that computer-based DSR can outperform humans in several cases, while also identifying an approximately 4–5 hour data requirement for effective speaker-dependent fine-tuning. The CDSD dataset aims to catalyze progress in Chinese DSR by offering a large, diverse, and publicly accessible resource for researchers and developers, potentially improving communication aids for individuals with dysarthria in China.
Abstract
Dysarthric speech poses significant challenges for individuals with dysarthria, impacting their ability to communicate socially. Despite the widespread use of Automatic Speech Recognition (ASR), accurately recognizing dysarthric speech remains a formidable task, largely due to the limited availability of dysarthric speech data. To address this gap, we developed the Chinese Dysarthria Speech Database (CDSD), the most extensive collection of Chinese dysarthria data to date, featuring 133 hours of recordings from 44 speakers. Our benchmarks reveal a best Character Error Rate (CER) of 16.4\%. Compared to the CER of 20.45\% from our additional human experiments, Dysarthric Speech Recognition (DSR) demonstrates its potential in significant improvement of communication for individuals with dysarthria. The CDSD database will be made publicly available at http://melab.psych.ac.cn/CDSD.html.
