Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Quy-Anh Dang; Chris Ngo

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Quy-Anh Dang, Chris Ngo

Abstract

We present Polyglot-Lion, a family of compact multilingual automatic speech recognition (ASR) models tailored for the linguistic landscape of Singapore, covering English, Mandarin, Tamil, and Malay. Our models are obtained by fine-tuning Qwen3-ASR-0.6B and Qwen3-ASR-1.7B exclusively on publicly available speech corpora, using a balanced sampling strategy that equalizes the number of training utterances per language and deliberately omits language-tag conditioning so that the model learns to identify languages implicitly from audio. On 12 benchmarks spanning the four target languages, Polyglot-Lion-1.7B achieves an average error rate of 14.85, competitive with MERaLiON-2-10B-ASR (14.32) - a model 6x larger - while incurring a training cost of \$81 on a single RTX PRO 6000 GPU compared to \$18,862 for the 128-GPU baseline. Inference throughput is approximately 20x faster than MERaLiON at 0.10 s/sample versus 2.02 s/sample. These results demonstrate that linguistically balanced fine-tuning of moderate-scale pretrained models can yield deployment-ready multilingual ASR at a fraction of the cost of larger specialist systems.

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Abstract

18,862 for the 128-GPU baseline. Inference throughput is approximately 20x faster than MERaLiON at 0.10 s/sample versus 2.02 s/sample. These results demonstrate that linguistically balanced fine-tuning of moderate-scale pretrained models can yield deployment-ready multilingual ASR at a fraction of the cost of larger specialist systems.

Paper Structure (41 sections, 2 equations, 1 figure, 5 tables, 1 algorithm)

This paper contains 41 sections, 2 equations, 1 figure, 5 tables, 1 algorithm.

Introduction
Related Work
Large-scale multilingual ASR.
Audio-language models.
Southeast Asian and Singapore ASR.
Multilingual training balance.
Language identification in ASR.
Datasets
English.
Mandarin.
Tamil.
Malay.
Data statistics and imbalance.
Preprocessing.
Method
...and 26 more sections

Figures (1)

Figure 1: Polyglot-Lion achieves near-SOTA accuracy at a fraction of the model size and inference cost. Left: Average error rate (WER/CER) across 12 benchmarks; lower is better. Right: Inference speed in seconds per sample; lower is better. Despite having 6$\times$ fewer parameters than MERaLiON-2-10B-ASR, Polyglot-Lion-1.7B matches its accuracy while being approximately 20$\times$ faster at inference.

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Abstract

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Authors

Abstract

Table of Contents

Figures (1)