Automated evaluation of children's speech fluency for low-resource languages
Bowen Zhang, Nur Afiqah Abdul Latiff, Justin Kan, Rong Tong, Donny Soh, Xiaoxiao Miao, Ian McLoughlin
TL;DR
This work tackles automatic assessment of children's speech fluency in low-resource languages by integrating a fine-tuned multilingual ASR with an objective metrics extractor and a GPT-based meta-evaluator. The approach employs data augmentation and LoRA-based ASR fine-tuning to adapt to Malay and Tamil, then uses WER/CER/PER, pause metrics, and speech rate as inputs to a GPT model that predicts fluency. GPT-based meta-evaluation outperforms traditional ML baselines and a multimodal GPT, achieving high accuracy, particularly for Malay. The findings demonstrate a scalable pathway for automated fluency assessment in very low-resource languages and potential applicability to other mother-tongue contexts.
Abstract
Assessment of children's speaking fluency in education is well researched for majority languages, but remains highly challenging for low resource languages. This paper proposes a system to automatically assess fluency by combining a fine-tuned multilingual ASR model, an objective metrics extraction stage, and a generative pre-trained transformer (GPT) network. The objective metrics include phonetic and word error rates, speech rate, and speech-pause duration ratio. These are interpreted by a GPT-based classifier guided by a small set of human-evaluated ground truth examples, to score fluency. We evaluate the proposed system on a dataset of children's speech in two low-resource languages, Tamil and Malay and compare the classification performance against Random Forest and XGBoost, as well as using ChatGPT-4o to predict fluency directly from speech input. Results demonstrate that the proposed approach achieves significantly higher accuracy than multimodal GPT or other methods.
