Are Sounds Sound for Phylogenetic Reconstruction?

Luise Häuser; Gerhard Jäger; Taraka Rama; Johann-Mattis List; Alexandros Stamatakis

Are Sounds Sound for Phylogenetic Reconstruction?

Luise Häuser, Gerhard Jäger, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis

TL;DR

This study evaluates whether phylogenies inferred from sound correspondence patterns can match or exceed those derived from lexical cognates in historical linguistics. Using ten Lexibank datasets, it derives cognate and sound-correspondence matrices, applies alignment trimming and pattern detection, and infers trees with both Bayesian and maximum-likelihood methods, comparing topologies to gold-standard Glottolog trees via generalized quartet distance $GQD$. Across analyses, cognate-based trees (and concatenated cognate+sound trees) are generally closer to the gold standards than sound-only trees, highlighting limitations of sound-based signals for automated phylogenetic inference. The work also documents prior-bias issues in Bayesian language analyses and emphasizes the need for complementary ML checks and broader datasets, suggesting cautious use of sound-based evidence and potential gains from data integration plus refined priors in future research.

Abstract

In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.

Are Sounds Sound for Phylogenetic Reconstruction?

TL;DR

. Across analyses, cognate-based trees (and concatenated cognate+sound trees) are generally closer to the gold standards than sound-only trees, highlighting limitations of sound-based signals for automated phylogenetic inference. The work also documents prior-bias issues in Bayesian language analyses and emphasizes the need for complementary ML checks and broader datasets, suggesting cautious use of sound-based evidence and potential gains from data integration plus refined priors in future research.

Abstract

Paper Structure (12 sections, 1 figure, 5 tables)

This paper contains 12 sections, 1 figure, 5 tables.

Introduction
Background
Materials and Methods
Materials
Methods
Bayesian Inference
Maximum Likelihood Tree Inferences
Implementation
Results
Bayesian Inference
Maximum Likelihood
Discussion and Conclusion

Figures (1)

Figure 1: Gain-loss processes derived from binary cognate vectors. A shows a wordlist where cognate words are encoded as multi-state characters. B shows the corresponding binary encoding. C shows how gain and loss processes are modeled on a phylogenetic tree.

Are Sounds Sound for Phylogenetic Reconstruction?

TL;DR

Abstract

Are Sounds Sound for Phylogenetic Reconstruction?

Authors

TL;DR

Abstract

Table of Contents

Figures (1)