The exponential distribution of the order of demonstrative, numeral, adjective and noun

Ramon Ferrer-i-Cancho

The exponential distribution of the order of demonstrative, numeral, adjective and noun

Ramon Ferrer-i-Cancho

TL;DR

This study analyzes the distribution of the 24 possible D–N–A–n noun phrase orders across languages to determine whether their rank frequencies follow an exponential (geometric) or a power-law form. By constructing a family of right-truncated models (Zeta and Geometric) and evaluating them with information criteria ($AIC_c$ and $BIC$), the authors find that exponential-like geometric models, particularly Geometric 1 with $R=r_{max}$, provide a better and more generalizable fit than power-law alternatives. The results challenge the notion of hard, language-wide constraints on word order and support Cysouw’s undersampling hypothesis that unattested orders arise from sampling limitations. The findings have broader implications for the interpretation of linguistic laws, suggesting that exponential patterns may be more common and informative than Zipf-like power laws in linguistic and cognitive phenomena.

Abstract

The frequency of the preferred order for a noun phrase formed by demonstrative, numeral, adjective and noun has received significant attention over the last two decades. We investigate the actual distribution of the 24 possible orders. There is no consensus on whether it is well-fitted by an exponential or a power law distribution. We find that an exponential distribution is a much better model. This finding and other circumstances where an exponential-like distribution is found challenge the view that power-law distributions, e.g., Zipf's law for word frequencies, are inevitable. We also investigate which of two exponential distributions gives a better fit: an exponential model where the 24 orders have non-zero probability (a geometric distribution truncated at rank 24) or an exponential model where the number of orders that can have non-zero probability is variable (a right-truncated geometric distribution). When consistency and generalizability are prioritized, we find higher support for the exponential model where all 24 orders have non-zero probability. These findings strongly suggest that there is no hard constraint on word order variation and then unattested orders merely result from undersampling, consistently with Cysouw's view.

The exponential distribution of the order of demonstrative, numeral, adjective and noun

TL;DR

Abstract

The exponential distribution of the order of demonstrative, numeral, adjective and noun

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)