ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li
TL;DR
The ICMC-ASR paper tackles robust in-car speech processing in realistic driving conditions using a large-scale multi-channel Mandarin dataset with deliberate noise and diverse driving conditions. It introduces two tracks—ASR and ASDR—with respective evaluation metrics CER and cpCER, and reports competitive results (e.g., $CER=13.16\%$, $cpCER=21.48\%$ by the winning team) indicating meaningful improvements over a baseline. Key contributions include dataset creation, track definitions, and analysis of effective techniques spanning speech frontend, SSL-based ASR backbones such as $\text{HuBERT}$, and diarization enhancements like multi-channel TS-VAD, showcasing a public benchmark for in-car multi-speaker recognition and diarization in challenging acoustic environments. The work offers a public benchmark for in-car multi-speaker recognition and diarization, driving progress toward reliable human-vehicle interaction in challenging acoustic environments.
Abstract
To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively.
