A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages
Jessica Nieder, Johann-Mattis List
TL;DR
This paper addresses computational assessment of mutual intelligibility among closely related languages by extending Linear Discriminative Learning (LDL) with multilingual semantic vectors and Dolgopolsky-style sound classes, formalized via $C F = S$ with predicted semantics $\hat{S}$. It combines a curated cognate dataset, multilingual embeddings, and phonetic sound-class representations to model cross-language comprehension, and tests how inflection trimming and language-pair choice affect performance. Key findings show that using 4-gram sound-class chunks yields higher accuracy, trimming inflectional endings interacts with language-pair effects, and cross-language results qualitatively mirror human data. The approach offers a uniform, scalable framework for automatic mutual intelligibility testing and provides insights into the cognitive mechanisms underlying cross-language comprehension across related languages.
Abstract
Closely related languages show linguistic similarities that allow speakers of one language to understand speakers of another language without having actively learned it. Mutual intelligibility varies in degree and is typically tested in psycholinguistic experiments. To study mutual intelligibility computationally, we propose a computer-assisted method using the Linear Discriminative Learner, a computational model developed to approximate the cognitive processes by which humans learn languages, which we expand with multilingual semantic vectors and multilingual sound classes. We test the model on cognate data from German, Dutch, and English, three closely related Germanic languages. We find that our model's comprehension accuracy depends on 1) the automatic trimming of inflections and 2) the language pair for which comprehension is tested. Our multilingual modelling approach does not only offer new methodological findings for automatic testing of mutual intelligibility across languages but also extends the use of Linear Discriminative Learning to multilingual settings.
