A Compressive-Expressive Communication Framework for Compositional Representations
Rafael Elberg, Felipe del Rio, Mircea Petrache, Denis Parra
TL;DR
CELEBI tackles the emergence of compositional representations in emergent communication by coupling three inductive biases—Progressive Decoding, Final-State Imitation, and Pairwise Distance Maximization—within an iterated learning framework. The discrete bottleneck and VAE backbone enable a reconstruction-based setup where messages encode generating factors in a structured way. Empirically, CELEBI yields higher Topographic Similarity and shorter useful message length on Shapes3D and MPI3D and improves disentanglement metrics relative to baselines, supporting the claim that simple, pressure-driven biases can foster compositional, transferable communication protocols. Overall, the work provides both theoretical and empirical evidence that expressivity-compressibility tensions, when guided by principled regularizers, can drive the emergence of generalizable, compositional communication.
Abstract
Compositionality in knowledge and language--the ability to represent complex concepts as a combination of simpler ones--is a hallmark of human cognition and communication. Despite recent advances, deep neural networks still struggle to acquire this property reliably. Neural models for emergent communication look to endow artificial agents with compositional language by simulating the pressures that form human language. In this work, we introduce CELEBI (Compressive-Expressive Language Emergence through a discrete Bottleneck and Iterated learning), a novel self-supervised framework for inducing compositional representations through a reconstruction-based communication game between a sender and a receiver. Building on theories of language emergence and the iterated learning framework, we integrate three mechanisms that jointly promote compressibility, expressivity, and efficiency in the emergent language. First, Progressive Decoding incentivizes intermediate reasoning by requiring the receiver to produce partial reconstructions after each symbol. Second, Final-State Imitation trains successive generations of agents to imitate reconstructions rather than messages, enforcing a tighter communication bottleneck. Third, Pairwise Distance Maximization regularizes message diversity by encouraging high distances between messages, with formal links to entropy maximization. Our method significantly improves both the efficiency and compositionality of the learned messages on the Shapes3D and MPI3D datasets, surpassing prior discrete communication frameworks in both reconstruction accuracy and topographic similarity. This work provides new theoretical and empirical evidence for the emergence of structured, generalizable communication protocols from simplicity-based inductive biases.
