Table of Contents
Fetching ...

Study of scaling laws in language families

Maelyson R. F. Santos, Marcelo A. F. Gomes

TL;DR

The study addresses scaling laws in language diversity at both macro (language-family counts) and micro (speakers per language) levels. It analyzes the twentieth edition of Ethnologue (6711 languages, 141 families) to reveal a macro-scale power law $N_F \sim r^{-\theta}$ with two regimes, $\theta = 1.5$ for intermediate ranks and $\theta = 2.0$ for large ranks, forming a Hollow Curve. It further shows that within the fourteen largest families, speaker counts follow $N \sim r^{-\kappa}$ with three exponent-quadruplets $\kappa = 1.15 \pm 0.05$, $1.65 \pm 0.05$, and $2.05 \pm 0.05$, while Afro-Asiatic and Nilo-Saharan are outliers ($\kappa = 2.6$ and $1.4$, respectively). These findings offer a quantitative framework for linguistic diversity and human migratory processes, suggesting structured diffusion effects across major language families and guiding future historical-linguistic modeling.

Abstract

This article investigates scaling laws within language families using data from over six thousand languages and analyzing emergent patterns observed in Zipf-like classification graphs. Both macroscopic (based on number of languages by family) and microscopic (based on numbers of speakers by language on a family) aspects of these classifications are examined. Particularly noteworthy is the discovery of a distinct division among the fourteen largest contemporary language families, excluding Afro-Asiatic and Nilo-Saharan languages. These families are found to be distributed across three language family quadruplets, each characterized by significantly different exponents in the Zipf graphs. This finding sheds light on the underlying structure and organization of major language families, revealing intriguing insights into the nature of linguistic diversity and distribution.

Study of scaling laws in language families

TL;DR

The study addresses scaling laws in language diversity at both macro (language-family counts) and micro (speakers per language) levels. It analyzes the twentieth edition of Ethnologue (6711 languages, 141 families) to reveal a macro-scale power law with two regimes, for intermediate ranks and for large ranks, forming a Hollow Curve. It further shows that within the fourteen largest families, speaker counts follow with three exponent-quadruplets , , and , while Afro-Asiatic and Nilo-Saharan are outliers ( and , respectively). These findings offer a quantitative framework for linguistic diversity and human migratory processes, suggesting structured diffusion effects across major language families and guiding future historical-linguistic modeling.

Abstract

This article investigates scaling laws within language families using data from over six thousand languages and analyzing emergent patterns observed in Zipf-like classification graphs. Both macroscopic (based on number of languages by family) and microscopic (based on numbers of speakers by language on a family) aspects of these classifications are examined. Particularly noteworthy is the discovery of a distinct division among the fourteen largest contemporary language families, excluding Afro-Asiatic and Nilo-Saharan languages. These families are found to be distributed across three language family quadruplets, each characterized by significantly different exponents in the Zipf graphs. This finding sheds light on the underlying structure and organization of major language families, revealing intriguing insights into the nature of linguistic diversity and distribution.

Paper Structure

This paper contains 4 sections, 2 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Number of languages $N_F$ of each language family as a function of their rank $r$. The dotted (dashed) line with slope -2.0 (-1.5) corresponds to scaling behavior associated with stable distributions Takayasu. The scaling exponents describe the data along approximately one decade of variability in the values of $r$.
  • Figure 2: Number of speakers $N$ by language as a function of rank $r$ for the Niger-Congo, Trans-New Guinea, Otomanguean and Tai-Kadai families. The dashed line provide guided to the eyes adjustments $N \sim r^{-\kappa}$ with $\kappa = 1.15$.
  • Figure 3: Number of speakers $N$ by language as a function of rank $r$ for the Austronesian, Sino-Tibetan, Indo-European and Australian families. The dashed line provide guided to the eyes adjustments $N \sim r^{-\kappa}$ with $\kappa = 1.65$.
  • Figure 4: Number of speakers $N$ by language as a function of rank $r$ for the Austroasiatic, Dravidian, Tupian and Uto-Aztecan families. The dashed line provide guided to the eyes adjustments $N \sim r^{-\kappa}$ with $\kappa = 2.05$.