Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe
TL;DR
ML-SUPERB 2023 broadens the multilingual self-supervised learning evaluation landscape by introducing three tracks (Research, Challenge, New Language) and expanding language coverage to $154$ languages. It demonstrates that scaling up models is not the sole path to multilingual proficiency and that diverse speech types and resource variability significantly impact performance. The challenge showcases a range of approaches, including MMS-1b, XLSR-128, and WavLabLM variants, with multilingual SSL generally outperforming monolingual baselines and efficient, data-diverse strategies proving competitive. The New Language Track further injects low-resource languages into the benchmark, underlining the practical impact of ML-SUPERB as a collaborative, evolving platform for multilingual speech representation research.
Abstract
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual speech processing.
