Exploring SSL Discrete Tokens for Multilingual ASR
Mingyu Cui, Daxin Tan, Yifan Yang, Dingdong Wang, Huimeng Wang, Xiao Chen, Xie Chen, Xunying Liu
TL;DR
This work assesses the viability of SSL-generated discrete tokens for multilingual ASR by comparing tokens from XLSR-53, WavLM-Large, and EnCodec across seven languages using a Zipformer-Transducer end-to-end model. It demonstrates that discrete tokens can match or exceed Fbank-based systems, with notable improvements in Polish and overall reductions in WER on dev/test sets. The study also analyzes data augmentation, tokenization strategies, and cross-language generalization, showing that monolingual tokenization with a larger cluster count generally outperforms multilingual shared-token approaches. Additionally, discrete tokens substantially reduce training time, highlighting practical benefits for rapid development and deployment of multilingual ASR systems.
Abstract
With the advancement of Self-supervised Learning (SSL) in speech-related tasks, there has been growing interest in utilizing discrete tokens generated by SSL for automatic speech recognition (ASR), as they offer faster processing techniques. However, previous studies primarily focused on multilingual ASR with Fbank features or English ASR with discrete tokens, leaving a gap in adapting discrete tokens for multilingual ASR scenarios. This study presents a comprehensive comparison of discrete tokens generated by various leading SSL models across multiple language domains. We aim to explore the performance and efficiency of speech discrete tokens across multiple language domains for both monolingual and multilingual ASR scenarios. Experimental results demonstrate that discrete tokens achieve comparable results against systems trained on Fbank features in ASR tasks across seven language domains with an average word error rate (WER) reduction of 0.31% and 1.76% absolute (2.80% and 15.70% relative) on dev and test sets respectively, with particularly WER reduction of 6.82% absolute (41.48% relative) on the Polish test set.
