Over-representation of phonological features in basic vocabulary doesn't replicate when controlling for spatial and phylogenetic effects

Frederic Blum

Over-representation of phonological features in basic vocabulary doesn't replicate when controlling for spatial and phylogenetic effects

Frederic Blum

TL;DR

The paper challenges prior claims that basic vocabulary universally over-represents certain phonological features by reanalyzing with a much larger, Lexibank-derived sample and by explicitly modeling phylogenetic and areal dependencies. Using a Bayesian Dirichlet framework with dual multilevel intercepts for genealogy and geography, the study finds that most previously reported patterns do not hold when dependencies are controlled, with only a small subset (notably some pronouns and body terms) remaining robust. The work demonstrates the importance of robustness analyses and large-scale, bias-controlled sampling for typological generalizations, and it provides open data and code to facilitate replication. Overall, the results temper claims of widespread sound symbolism in basic vocabulary and highlight the nuanced, context-dependent nature of such patterns across languages.

Abstract

The statistical over-representation of phonological features in the basic vocabulary of languages is often interpreted as reflecting potentially universal sound symbolic patterns. However, most of those results have not been tested explicitly for reproducibility and might be prone to biases in the study samples or models. Many studies on the topic do not adequately control for genealogical and areal dependencies between sampled languages, casting doubts on the robustness of the results. In this study, we test the robustness of a recent study on sound symbolism of basic vocabulary concepts which analyzed 245 languages.The new sample includes data on 2864 languages from Lexibank. We modify the original model by adding statistical controls for spatial and phylogenetic dependencies between languages. The new results show that most of the previously observed patterns are not robust, and in fact many patterns disappear completely when adding the genealogical and areal controls. A small number of patterns, however, emerges as highly stable even with the new sample. Through the new analysis, we are able to assess the distribution of sound symbolism on a larger scale than previously. The study further highlights the need for testing all universal claims on language for robustness on various levels.

Over-representation of phonological features in basic vocabulary doesn't replicate when controlling for spatial and phylogenetic effects

TL;DR

Abstract

Over-representation of phonological features in basic vocabulary doesn't replicate when controlling for spatial and phylogenetic effects

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)