Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering
Pradeep Rangappa, Andres Carofilis, Jeena Prakash, Shashi Kumar, Sergio Burdisso, Srikanth Madikeri, Esau Villatoro-Tello, Bidisha Sharma, Petr Motlicek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke
TL;DR
The paper tackles the challenge of domain-adapting ASR with limited labeled data and computational resources by introducing a robust data-selection pipeline that filters pseudo-labels generated by Whisper and Zipformer. It combines WER-prediction, NER-based selection, and CER-based filtering to curate a small, high-quality subset of pseudo-labeled data for fine-tuning. Empirical results on Wow and Fisher English show that 1–5% of pseudo-labeled data, chosen with CER-based or multi-criteria selection, can match or surpass full-dataset fine-tuning, enabling efficient domain adaptation. The approach is practical for real-world deployments where annotation and compute budgets are constrained, and it adapts to evolving acoustic and lexical properties to maintain accuracy.
Abstract
Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources. Here, we explore different data selection pipelines and propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper (encoder-decoder) and Zipformer (transducer) models. Our approach integrates multiple selection strategies -- including word error rate (WER) prediction, named entity recognition (NER), and character error rate (CER) analysis -- to extract high-quality training segments. We evaluate our method on Whisper and Zipformer using a 7500-hour baseline, comparing it to a CER-based approach relying on hypotheses from three ASR systems. Fine-tuning on 7500 hours of pseudo-labeled call center data achieves 12.3% WER, while our filtering reduces the dataset to 100 hours (1.4%) with similar performance; a similar trend is observed on Fisher English.
