Learning Efficient Recursive Numeral Systems via Reinforcement Learning

Andrea Silvi; Jonathan Thomas; Emil Carlsson; Devdatt Dubhashi; Moa Johansson

Learning Efficient Recursive Numeral Systems via Reinforcement Learning

Andrea Silvi, Jonathan Thomas, Emil Carlsson, Devdatt Dubhashi, Moa Johansson

TL;DR

The paper addresses how recursive numeral systems can emerge under pressures for efficient communication. It introduces a neuro-symbolic two-agent RL framework built on a slightly modified Hurford meta-grammar to allow grammar changes and optimization. The main contributions show that RL-guided interaction yields numeral systems that lie near the Pareto frontier of lexicon size and morphosyntactic complexity, with configurations bearing resemblance to human systems. This work provides a mechanistic explanation for the emergence of efficient recursive numeral systems and points to future work on iterated learning and distributional effects to further align with human languages.

Abstract

It has previously been shown that by using reinforcement learning (RL), agents can derive simple approximate and exact-restricted numeral systems that are similar to human ones (Carlsson, 2021). However, it is a major challenge to show how more complex recursive numeral systems, similar to for example English, could arise via a simple learning mechanism such as RL. Here, we introduce an approach towards deriving a mechanistic explanation of the emergence of efficient recursive number systems. We consider pairs of agents learning how to communicate about numerical quantities through a meta-grammar that can be gradually modified throughout the interactions. Utilising a slightly modified version of the meta-grammar of Hurford (1975), we demonstrate that our RL agents, shaped by the pressures for efficient communication, can effectively modify their lexicon towards Pareto-optimal configurations which are comparable to those observed within human numeral systems in terms of their efficiency.

Learning Efficient Recursive Numeral Systems via Reinforcement Learning

TL;DR

Abstract

Paper Structure (12 sections, 4 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 12 sections, 4 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
Efficiency of Recursive Numeral Systems
Meta-grammars for recursive number systems
Complexity Metrics
Reinforcement Learning for Grammars
Implementation Details
Results and Discussion
Communication Leads to More Efficient Languages
Reward Structure for Numeral Systems
Characteristics of Pareto Optimal Languages
Conclusions and Future Work
Acknowledgments

Figures (6)

Figure 1: Three distinct numeral systems from three differing languages are shown. These are the approximate numerals used in Chiquitano, the exact-restricted numerals used in Awa Pit and the recursive numerals used in English. Colors indicate how words are assigned to numeral concepts.
Figure 2: Reproduction of Figure 4b from Xu2020, showing that while restricted numeral systems seem to optimize the simplicity/informativeness tradeoff (here are plotted the complexity and communicative cost, their opposite), recursive numeral systems (plotted as blue dots) do not, as they lie far away from the Pareto-optimal recursive numeral system (the left-most point of the blue line).
Figure 3: An example of a communication round: a number, e.g. $n=9$, is sampled from the need distribution. The corresponding representation in terms of the current $(D,M)$ is passed to the speaker, which encodes it neurally. The listener receives this message and outputs the number it thinks it refers to. Both agents are rewarded if the guess is correct.
Figure 4: Trajectories showing the evolution of the languages during communication between our agents, starting from points 1-5 in Table \ref{['tab:starting_points']} (for visual clarity not all trajectories are included). The agents tend towards languages that are closer to the Pareto frontier. The grey area represent unobtainable configurations.
Figure 5: Final languages of our agents, plotted in terms of lexicon size and average morphosyntactic complexity. The final grammars tend to lie close to the Pareto frontier. Points are colour coded based on their starting $(D,M)$ pair.
...and 1 more figures

Learning Efficient Recursive Numeral Systems via Reinforcement Learning

TL;DR

Abstract

Learning Efficient Recursive Numeral Systems via Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)