Multilingual MFA: Forced Alignment on Low-Resource Related Languages
Alessio Tosolini, Claire Bowern
TL;DR
The paper addresses forced alignment for low-resource, phonologically related Australian languages by comparing multilingual crosslingual training with adaptation of a large English MFA model. It trains from scratch on a multilingual subset and also adapts English models to related languages, evaluating on seen data, unseen data from a seen language, and unseen data from an unseen language. Results show that English-based models, especially when adapted, generally yield higher precision and vowel-space fidelity, while training on multilingual data provides notable gains for completely unseen languages but can dilute performance on seen data. The work demonstrates the practical value of pretrained English MFA models for fieldwork and language documentation, while highlighting persistent challenges with rhotics/trills and suggesting data augmentation as a future direction.
Abstract
We compare the outcomes of multilingual and crosslingual training for related and unrelated Australian languages with similar phonological inventories. We use the Montreal Forced Aligner to train acoustic models from scratch and adapt a large English model, evaluating results against seen data, unseen data (seen language), and unseen data and language. Results indicate benefits of adapting the English baseline model for previously unseen languages.
