Lewis's Signaling Game as beta-VAE For Natural Word Lengths and Segments
Ryo Ueda, Tadahiro Taniguchi
TL;DR
This work addresses the gap between emergent languages in emergent communication and natural-language statistics by reframing Lewis's signaling game as a $β$-VAE and optimizing an ELBO with a learnable prior. The approach introduces a prior over messages as a language model, enabling variable-length messages and a principled tradeoff between informativeness and processing cost via surprisal theory. Empirical results show improvements in segmentation-relevant properties (Zipf's law of abbreviation and Harris's articulation scheme) and related metrics, suggesting prior choice can guide emergent languages toward more natural structures. Overall, the paper offers a principled, generative framework for EC that connects representation learning, cognitive theories, and language statistics, with broad implications for designing more interpretable and human-like emergent languages.
Abstract
As a sub-discipline of evolutionary and computational linguistics, emergent communication (EC) studies communication protocols, called emergent languages, arising in simulations where agents communicate. A key goal of EC is to give rise to languages that share statistical properties with natural languages. In this paper, we reinterpret Lewis's signaling game, a frequently used setting in EC, as beta-VAE and reformulate its objective function as ELBO. Consequently, we clarify the existence of prior distributions of emergent languages and show that the choice of the priors can influence their statistical properties. Specifically, we address the properties of word lengths and segmentation, known as Zipf's law of abbreviation (ZLA) and Harris's articulation scheme (HAS), respectively. It has been reported that the emergent languages do not follow them when using the conventional objective. We experimentally demonstrate that by selecting an appropriate prior distribution, more natural segments emerge, while suggesting that the conventional one prevents the languages from following ZLA and HAS.
