Recursive numeral systems are highly regular and easy to process
Ponrawee Prasertsom, Andrea Silvi, Jennifer Culbertson, Moa Johansson, Devdatt Dubhashi, Kenny Smith
TL;DR
The paper reframes recursive numeral system efficiency by foregrounding regularity and processing complexity, arguing that MDL-based measures of these properties better separate natural from unattested systems than prior lexicon-size–versus–morphosyntactic-cost trade-offs. It introduces irregularity via minimal partial DFA complexity and processing cost via MDL parsing, applying these to natural languages, random baselines, and prior-optimal systems. The results show natural numeral systems are markedly more regular and easier to process, remaining near local Pareto frontiers under multiple controls and priors. The study highlights regularity as a key driver of human-like numeral systems and proposes methodological extensions to generalize efficiency analyses to broader, formation-based linguistic domains.
Abstract
Previous work has argued that recursive numeral systems optimise the trade-off between lexicon size and average morphosyntatic complexity (Denić and Szymanik, 2024). However, showing that only natural-language-like systems optimise this tradeoff has proven elusive, and the existing solution has relied on ad-hoc constraints to rule out unnatural systems (Yang and Regier, 2025). Here, we argue that this issue arises because the proposed trade-off has neglected regularity, a crucial aspect of complexity central to human grammars in general. Drawing on the Minimum Description Length (MDL) approach, we propose that recursive numeral systems are better viewed as efficient with regard to their regularity and processing complexity. We show that our MDL-based measures of regularity and processing complexity better capture the key differences between attested, natural systems and unattested but possible ones, including "optimal" recursive numeral systems from previous work, and that the ad-hoc constraints from previous literature naturally follow from regularity. Our approach highlights the need to incorporate regularity across sets of forms in studies that attempt to measure and explain optimality in language.
