Transfer Learning of Transformer-based Speech Recognition Models from Czech to Slovak
Jan Lehečka, Josef V. Psutka, Josef Psutka
TL;DR
This paper investigates transfer learning for Slovak ASR by bootstrapping Slovak models from a Czech pre-trained Wav2Vec 2.0. It compares multiple pre-training initializations and large multilingual baselines on three Slovak datasets (CommonVoice, VoxPopuli, MALACH) using grapheme-level CTC with and without an external language model. Findings show that Czech-to-Slovak transfer generally reduces WER, especially as target data grows; monolingual Slovak pre-training can rival multilingual models on some datasets; and larger multilingual baselines do not always win, highlighting the value of language-adaptive initialization. The work demonstrates data-efficient cross-language transfer for closely related languages and provides public release of a Slovak-initialized W2V2-cs-sk to support reproducibility and energy-efficient deployment.
Abstract
In this paper, we are comparing several methods of training the Slovak speech recognition models based on the Transformers architecture. Specifically, we are exploring the approach of transfer learning from the existing Czech pre-trained Wav2Vec 2.0 model into Slovak. We are demonstrating the benefits of the proposed approach on three Slovak datasets. Our Slovak models scored the best results when initializing the weights from the Czech model at the beginning of the pre-training phase. Our results show that the knowledge stored in the Cezch pre-trained model can be successfully reused to solve tasks in Slovak while outperforming even much larger public multilingual models.
