Table of Contents
Fetching ...

Word length predicts word order: "Min-max"-ing drives language evolution

Hiram Ring

TL;DR

The paper addresses how word order evolves by proposing a universal Min-Max theory that balances processing efficiency with information maximization during production. It operationalizes this claim with a massive POS-tagged corpus of $1{,}942$ languages and uses the $N1$ ratio alongside noun/verb length measures to predict basic word order, outperforming genealogical and areal explanations. Predictive validity is demonstrated diachronically in Semitic languages and Old Irish, and variance in word order is shown to be more strongly linked to noun/verb lengths than to descent once regional area is accounted for. The findings support a parsimonious, information-centered mechanism for language change and highlight the value of large-scale corpora for typology and diachronic linguistics.

Abstract

A fundamental concern in linguistics has been to understand how languages change, such as in relation to word order. Since the order of words in a sentence (i.e. the relative placement of Subject, Object, and Verb) is readily identifiable in most languages, this has been a productive field of study for decades (see Greenberg 1963; Dryer 2007; Hawkins 2014). However, a language's word order can change over time, with competing explanations for such changes (Carnie and Guilfoyle 2000; Crisma and Longobardi 2009; Martins and Cardoso 2018; Dunn et al. 2011; Jager and Wahle 2021). This paper proposes a general universal explanation for word order change based on a theory of communicative interaction (the Min-Max theory of language behavior) in which agents seek to minimize effort while maximizing information. Such an account unifies opposing findings from language processing (Piantadosi et al. 2011; Wasow 2022; Levy 2008) that make different predictions about how word order should be realized crosslinguistically. The marriage of both "efficiency" and "surprisal" approaches under the Min-Max theory is justified with evidence from a massive dataset of 1,942 language corpora tagged for parts of speech (Ring 2025), in which average lengths of particular word classes correlates with word order, allowing for prediction of basic word order from diverse corpora. The general universal pressure of word class length in corpora is shown to give a stronger explanation for word order realization than either genealogical or areal factors, highlighting the importance of language corpora for investigating such questions.

Word length predicts word order: "Min-max"-ing drives language evolution

TL;DR

The paper addresses how word order evolves by proposing a universal Min-Max theory that balances processing efficiency with information maximization during production. It operationalizes this claim with a massive POS-tagged corpus of languages and uses the ratio alongside noun/verb length measures to predict basic word order, outperforming genealogical and areal explanations. Predictive validity is demonstrated diachronically in Semitic languages and Old Irish, and variance in word order is shown to be more strongly linked to noun/verb lengths than to descent once regional area is accounted for. The findings support a parsimonious, information-centered mechanism for language change and highlight the value of large-scale corpora for typology and diachronic linguistics.

Abstract

A fundamental concern in linguistics has been to understand how languages change, such as in relation to word order. Since the order of words in a sentence (i.e. the relative placement of Subject, Object, and Verb) is readily identifiable in most languages, this has been a productive field of study for decades (see Greenberg 1963; Dryer 2007; Hawkins 2014). However, a language's word order can change over time, with competing explanations for such changes (Carnie and Guilfoyle 2000; Crisma and Longobardi 2009; Martins and Cardoso 2018; Dunn et al. 2011; Jager and Wahle 2021). This paper proposes a general universal explanation for word order change based on a theory of communicative interaction (the Min-Max theory of language behavior) in which agents seek to minimize effort while maximizing information. Such an account unifies opposing findings from language processing (Piantadosi et al. 2011; Wasow 2022; Levy 2008) that make different predictions about how word order should be realized crosslinguistically. The marriage of both "efficiency" and "surprisal" approaches under the Min-Max theory is justified with evidence from a massive dataset of 1,942 language corpora tagged for parts of speech (Ring 2025), in which average lengths of particular word classes correlates with word order, allowing for prediction of basic word order from diverse corpora. The general universal pressure of word class length in corpora is shown to give a stronger explanation for word order realization than either genealogical or areal factors, highlighting the importance of language corpora for investigating such questions.

Paper Structure

This paper contains 11 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Min-Max schema
  • Figure 2: The N1 ratio and word order in 3 typological databases (p < 0.001; Ring:2025aa)
  • Figure 3: Normalized lens of Args, Preds, by word order
  • Figure 4: Frequency-weighted lens of Ns, Vs, by word order
  • Figure 5: Min-Max schema