Computational Typology
Gerhard Jäger
TL;DR
The paper tackles how to robustly test language universals and cross-linguistic correlations using computational typology, explicitly addressing non-independence due to genealogy and areal contact. It demonstrates, through hierarchical and phylogenetic Bayesian models, that correlations between affix positioning and adposition type are a genuine diachronic tendency, whereas the apparent link between population size and phoneme inventory size largely reflects shared ancestry. By employing bivariate models for mixed data types and comparing vanilla, hierarchical, and phylogenetic formulations, the work shows substantial gains in model fit and interpretability when genealogical structure is accounted for. The findings underscore the need to integrate phylogenetic information in typological research to distinguish true co-evolution from historical contingency, with practical implications for large-scale linguistic databases and future computational typology studies.
Abstract
Typology is a subfield of linguistics that focuses on the study and classification of languages based on their structural features. Unlike genealogical classification, which examines the historical relationships between languages, typology seeks to understand the diversity of human languages by identifying common properties and patterns, known as universals. In recent years, computational methods have played an increasingly important role in typological research, enabling the analysis of large-scale linguistic data and the testing of hypotheses about language structure and evolution. This article provides an illustration of the benefits of computational statistical modeling in typology.
