Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Jiatong Shi; William Chen; Dan Berrebbi; Hsiu-Hsuan Wang; Wei-Ping Huang; En-Pei Hu; Ho-Lam Chuang; Xuankai Chang; Yuxun Tang; Shang-Wen Li; Abdelrahman Mohamed; Hung-yi Lee; Shinji Watanabe

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe

TL;DR

ML-SUPERB 2023 broadens the multilingual self-supervised learning evaluation landscape by introducing three tracks (Research, Challenge, New Language) and expanding language coverage to $154$ languages. It demonstrates that scaling up models is not the sole path to multilingual proficiency and that diverse speech types and resource variability significantly impact performance. The challenge showcases a range of approaches, including MMS-1b, XLSR-128, and WavLabLM variants, with multilingual SSL generally outperforming monolingual baselines and efficient, data-diverse strategies proving competitive. The New Language Track further injects low-resource languages into the benchmark, underlining the practical impact of ML-SUPERB as a collaborative, evolving platform for multilingual speech representation research.

Abstract

The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual speech processing.

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

TL;DR

ML-SUPERB 2023 broadens the multilingual self-supervised learning evaluation landscape by introducing three tracks (Research, Challenge, New Language) and expanding language coverage to

languages. It demonstrates that scaling up models is not the sole path to multilingual proficiency and that diverse speech types and resource variability significantly impact performance. The challenge showcases a range of approaches, including MMS-1b, XLSR-128, and WavLabLM variants, with multilingual SSL generally outperforming monolingual baselines and efficient, data-diverse strategies proving competitive. The New Language Track further injects low-resource languages into the benchmark, underlining the practical impact of ML-SUPERB as a collaborative, evolving platform for multilingual speech representation research.

Abstract

Paper Structure (13 sections, 2 figures, 4 tables)

This paper contains 13 sections, 2 figures, 4 tables.

Introduction
Background
SUPERB and its Challenges
Multilingual Speech Self-supervised Representation
Tracks in ML-SUPERB Challenge
Challenge Track
New Language Track
Submissions
New Language Track Submissions
Challenge Submissions
Challenge Results Summary
Conclusion
Acknowledgements

Figures (2)

Figure 1: Geographical distribution of the New Language track submissions. The 45 languages are marked on a map with their rough locations of speaking.
Figure 2: MACs v.s. SUPERB score in ML-SUPERB 1-hour hidden benchmark.

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

TL;DR

Abstract

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Authors

TL;DR

Abstract

Table of Contents

Figures (2)