A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages

Jessica Nieder; Johann-Mattis List

A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages

Jessica Nieder, Johann-Mattis List

TL;DR

This paper addresses computational assessment of mutual intelligibility among closely related languages by extending Linear Discriminative Learning (LDL) with multilingual semantic vectors and Dolgopolsky-style sound classes, formalized via $C F = S$ with predicted semantics $\hat{S}$. It combines a curated cognate dataset, multilingual embeddings, and phonetic sound-class representations to model cross-language comprehension, and tests how inflection trimming and language-pair choice affect performance. Key findings show that using 4-gram sound-class chunks yields higher accuracy, trimming inflectional endings interacts with language-pair effects, and cross-language results qualitatively mirror human data. The approach offers a uniform, scalable framework for automatic mutual intelligibility testing and provides insights into the cognitive mechanisms underlying cross-language comprehension across related languages.

Abstract

Closely related languages show linguistic similarities that allow speakers of one language to understand speakers of another language without having actively learned it. Mutual intelligibility varies in degree and is typically tested in psycholinguistic experiments. To study mutual intelligibility computationally, we propose a computer-assisted method using the Linear Discriminative Learner, a computational model developed to approximate the cognitive processes by which humans learn languages, which we expand with multilingual semantic vectors and multilingual sound classes. We test the model on cognate data from German, Dutch, and English, three closely related Germanic languages. We find that our model's comprehension accuracy depends on 1) the automatic trimming of inflections and 2) the language pair for which comprehension is tested. Our multilingual modelling approach does not only offer new methodological findings for automatic testing of mutual intelligibility across languages but also extends the use of Linear Discriminative Learning to multilingual settings.

A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages

TL;DR

with predicted semantics

. It combines a curated cognate dataset, multilingual embeddings, and phonetic sound-class representations to model cross-language comprehension, and tests how inflection trimming and language-pair choice affect performance. Key findings show that using 4-gram sound-class chunks yields higher accuracy, trimming inflectional endings interacts with language-pair effects, and cross-language results qualitatively mirror human data. The approach offers a uniform, scalable framework for automatic mutual intelligibility testing and provides insights into the cognitive mechanisms underlying cross-language comprehension across related languages.

Abstract

Paper Structure (13 sections, 1 figure, 3 tables)

This paper contains 13 sections, 1 figure, 3 tables.

Introduction
Linear Discriminative Learning
Materials and Methods
Dataset of German Cognates
Multilingual Semantic Vectors
Multilingual Sound Classes
Trimming Word Forms
Linear Discriminative Learning Model
Implementation
Evaluation
Evaluation on Individual Languages
Evaluation Across Languages
Discussion and Conclusion

Figures (1)

Figure 1: Distribution of cosine similarity scores between language pairs for all cognate triplets. Note that smoothing of the distribution results in values exceeding 1.0.

A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages

TL;DR

Abstract

A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages

Authors

TL;DR

Abstract

Table of Contents

Figures (1)