Lewis's Signaling Game as beta-VAE For Natural Word Lengths and Segments

Ryo Ueda; Tadahiro Taniguchi

Lewis's Signaling Game as beta-VAE For Natural Word Lengths and Segments

Ryo Ueda, Tadahiro Taniguchi

TL;DR

This work addresses the gap between emergent languages in emergent communication and natural-language statistics by reframing Lewis's signaling game as a $β$-VAE and optimizing an ELBO with a learnable prior. The approach introduces a prior over messages as a language model, enabling variable-length messages and a principled tradeoff between informativeness and processing cost via surprisal theory. Empirical results show improvements in segmentation-relevant properties (Zipf's law of abbreviation and Harris's articulation scheme) and related metrics, suggesting prior choice can guide emergent languages toward more natural structures. Overall, the paper offers a principled, generative framework for EC that connects representation learning, cognitive theories, and language statistics, with broad implications for designing more interpretable and human-like emergent languages.

Abstract

As a sub-discipline of evolutionary and computational linguistics, emergent communication (EC) studies communication protocols, called emergent languages, arising in simulations where agents communicate. A key goal of EC is to give rise to languages that share statistical properties with natural languages. In this paper, we reinterpret Lewis's signaling game, a frequently used setting in EC, as beta-VAE and reformulate its objective function as ELBO. Consequently, we clarify the existence of prior distributions of emergent languages and show that the choice of the priors can influence their statistical properties. Specifically, we address the properties of word lengths and segmentation, known as Zipf's law of abbreviation (ZLA) and Harris's articulation scheme (HAS), respectively. It has been reported that the emergent languages do not follow them when using the conventional objective. We experimentally demonstrate that by selecting an appropriate prior distribution, more natural segments emerge, while suggesting that the conventional one prevents the languages from following ZLA and HAS.

Lewis's Signaling Game as beta-VAE For Natural Word Lengths and Segments

TL;DR

This work addresses the gap between emergent languages in emergent communication and natural-language statistics by reframing Lewis's signaling game as a

-VAE and optimizing an ELBO with a learnable prior. The approach introduces a prior over messages as a language model, enabling variable-length messages and a principled tradeoff between informativeness and processing cost via surprisal theory. Empirical results show improvements in segmentation-relevant properties (Zipf's law of abbreviation and Harris's articulation scheme) and related metrics, suggesting prior choice can guide emergent languages toward more natural structures. Overall, the paper offers a principled, generative framework for EC that connects representation learning, cognitive theories, and language statistics, with broad implications for designing more interpretable and human-like emergent languages.

Abstract

Paper Structure (18 sections, 2 theorems, 45 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 2 theorems, 45 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Background
Language Emergence via Lewis's Signaling Game
On the Statistical Properties of Languages
Implicit Prior Distribution and Inefficiency of Emergent Language
Relationship to Surprisal Theory
Experiment
Setup
Experimental Result
Discussion
Related Work
Conclusion
HAS-based Boundary Detection Algorithm
Proofs on the Existence of Implicit Priors
Supplemental Information on Experimental Setup
...and 3 more sections

Key Result

Lemma 1

The following equation holds: where $f_{\theta}:\mathcal{X}\xspace\times\mathcal{M}\xspace\to\mathbb{R}$ is an any function differentiable w.r.t $\theta$.

Figures (5)

Figure 1: Illustration of similarity between signaling games and (beta-)VAE.
Figure 2: Results for $n_{\textrm{bou}}$ (\ref{['A1']}), $n_{\textrm{seg}}$ (\ref{['A2']}), $\Delta_{\textrm{w},\textrm{c}}$ (\ref{['A3']}), C-TopSim (\ref{['A3']}), and W-TopSim (\ref{['A3']}) are shown in order from the left. The x-axis represents $(n_{\textrm{att}},n_{\textrm{val}})$ while the y-axis represents the values of each metric. The shaded regions and error bars represent the standard error of mean. The $\textrm{threshold}\xspace$ parameter is set to $0$. The blue plots represent the results for our ELBO-based objective $\mathcal{J}\xspace_{\textrm{ours}}\xspace$, the orange ones for (\ref{['BL: conventional']}) the conventional objective $\mathcal{J}\xspace_{\textrm{conv}}\xspace$ plus the entropy regularizer, and the grey ones for (\ref{['BL: priorExp']}) the ELBO-based objective whose prior is $P^{\textrm{prior}}\xspace_{\alpha\xspace}\xspace$. The apparent inferior performance of $\Delta_{\textrm{w},\textrm{c}}$ for $\mathcal{J}\xspace_{\textrm{ours}}\xspace$ compared to the baselines might be misleading. It is because $\mathcal{J}\xspace_{\textrm{ours}}\xspace$ greatly improves both C-TomSim and W-TopSim. The larger scale of their improvements could result in a seemingly worse $\Delta_{\textrm{w},\textrm{c}}$, but this does not necessarily indicate poorer performance.
Figure 3: Results for $n_{\textrm{bou}}$ (\ref{['A1']}), $n_{\textrm{seg}}$ (\ref{['A2']}), $\Delta_{\textrm{w},\textrm{c}}$ (\ref{['A3']}), C-TopSim (\ref{['A3']}), and W-TopSim (\ref{['A3']}) are shown in order from the left. The x-axis represents $(n_{\textrm{att}},n_{\textrm{val}})$ while the y-axis represents the values of each metric. The shaded regions and error bars represent the standard error of mean. The $\textrm{threshold}\xspace$ parameter is set to $0.25$. The blue plots represent the results for our ELBO-based objective $\mathcal{J}\xspace_{\textrm{ours}}\xspace$, the orange ones for (\ref{['BL: conventional']}) the conventional objective $\mathcal{J}\xspace_{\textrm{conv}}\xspace$ plus the entropy regularizer, and the grey ones for (\ref{['BL: priorExp']}) the ELBO-based objective whose prior is $P^{\textrm{prior}}\xspace_{\alpha\xspace}\xspace$. The apparent inferior performance of $\Delta_{\textrm{w},\textrm{c}}$ for $\mathcal{J}\xspace_{\textrm{ours}}\xspace$ compared to the baselines might be misleading. It is because $\mathcal{J}\xspace_{\textrm{ours}}\xspace$ greatly improves both C-TomSim and W-TopSim. The larger scale of their improvements could result in a seemingly worse $\Delta_{\textrm{w},\textrm{c}}$, but this does not necessarily indicate poorer performance.
Figure 4: Results for $n_{\textrm{bou}}$ (\ref{['A1']}), $n_{\textrm{seg}}$ (\ref{['A2']}), $\Delta_{\textrm{w},\textrm{c}}$ (\ref{['A3']}), C-TopSim (\ref{['A3']}), and W-TopSim (\ref{['A3']}) are shown in order from the left. The x-axis represents $(n_{\textrm{att}},n_{\textrm{val}})$ while the y-axis represents the values of each metric. The shaded regions and error bars represent the standard error of mean. The $\textrm{threshold}\xspace$ parameter is set to $0.25$. The blue plots represent the results for our ELBO-based objective $\mathcal{J}\xspace_{\textrm{ours}}\xspace$, the orange ones for (\ref{['BL: conventional']}) the conventional objective $\mathcal{J}\xspace_{\textrm{conv}}\xspace$ plus the entropy regularizer, and the grey ones for (\ref{['BL: priorExp']}) the ELBO-based objective whose prior is $P^{\textrm{prior}}\xspace_{\alpha\xspace}\xspace$. The apparent inferior performance of $\Delta_{\textrm{w},\textrm{c}}$ for $\mathcal{J}\xspace_{\textrm{ours}}\xspace$ compared to the baselines might be misleading. It is because $\mathcal{J}\xspace_{\textrm{ours}}\xspace$ greatly improves both C-TomSim and W-TopSim. The larger scale of their improvements could result in a seemingly worse $\Delta_{\textrm{w},\textrm{c}}$, but this does not necessarily indicate poorer performance.
Figure 5: Mean message length sorted by objects' frequency across 32 random seeds. A moving average with a window size of 10 is shown for readability.

Theorems & Definitions (6)

Remark 1
Lemma 1
Remark 2
Lemma 2
Remark 3: Curious Case on Length Penalty
Remark 4

Lewis's Signaling Game as beta-VAE For Natural Word Lengths and Segments

TL;DR

Abstract

Lewis's Signaling Game as beta-VAE For Natural Word Lengths and Segments

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)