ICASSP 2024 Speech Signal Improvement Challenge
Nicolae Catalin Ristea, Ando Saabas, Ross Cutler, Babak Naderi, Sebastian Braun, Solomiya Branets
TL;DR
The paper introduces the ICASSP 2024 Speech Signal Improvement Grand Challenge, a platform to advance speech quality in mainstream communication systems and benchmark methods under a unified evaluation framework. It extends the 2023 SIG Challenge by adding a dataset synthesizer to raise baselines, introducing the SIGMOS objective metric aligned with extended P.804 tests, releasing transcripts for Word Accuracy evaluation, and adding Word Accuracy as a metric. The evaluation uses a blind 500-clip dataset across devices, environments, and languages, with subjective P.804 MOS collected via MTurk and WAcc computed from Azure, and the Final Score is $((SIG-1)/4 + (OVRL-1)/4 + WAcc)/3$. Results show statistically significant differences among top entrants in both real-time and non-real-time tracks, illustrating the practical impact of the proposed enhancements on real-world speech-quality improvement.
Abstract
The ICASSP 2024 Speech Signal Improvement Grand Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems. This marks our second challenge, building upon the success from the previous ICASSP 2023 Grand Challenge. We enhance the competition by introducing a dataset synthesizer, enabling all participating teams to start at a higher baseline, an objective metric for our extended P.804 tests, transcripts for the 2023 test set, and we add Word Accuracy (WAcc) as a metric. We evaluate a total of 13 systems in the real-time track and 11 systems in the non-real-time track using both subjective P.804 and objective Word Accuracy metrics.
