Table of Contents
Fetching ...

A Compressive-Expressive Communication Framework for Compositional Representations

Rafael Elberg, Felipe del Rio, Mircea Petrache, Denis Parra

TL;DR

CELEBI tackles the emergence of compositional representations in emergent communication by coupling three inductive biases—Progressive Decoding, Final-State Imitation, and Pairwise Distance Maximization—within an iterated learning framework. The discrete bottleneck and VAE backbone enable a reconstruction-based setup where messages encode generating factors in a structured way. Empirically, CELEBI yields higher Topographic Similarity and shorter useful message length on Shapes3D and MPI3D and improves disentanglement metrics relative to baselines, supporting the claim that simple, pressure-driven biases can foster compositional, transferable communication protocols. Overall, the work provides both theoretical and empirical evidence that expressivity-compressibility tensions, when guided by principled regularizers, can drive the emergence of generalizable, compositional communication.

Abstract

Compositionality in knowledge and language--the ability to represent complex concepts as a combination of simpler ones--is a hallmark of human cognition and communication. Despite recent advances, deep neural networks still struggle to acquire this property reliably. Neural models for emergent communication look to endow artificial agents with compositional language by simulating the pressures that form human language. In this work, we introduce CELEBI (Compressive-Expressive Language Emergence through a discrete Bottleneck and Iterated learning), a novel self-supervised framework for inducing compositional representations through a reconstruction-based communication game between a sender and a receiver. Building on theories of language emergence and the iterated learning framework, we integrate three mechanisms that jointly promote compressibility, expressivity, and efficiency in the emergent language. First, Progressive Decoding incentivizes intermediate reasoning by requiring the receiver to produce partial reconstructions after each symbol. Second, Final-State Imitation trains successive generations of agents to imitate reconstructions rather than messages, enforcing a tighter communication bottleneck. Third, Pairwise Distance Maximization regularizes message diversity by encouraging high distances between messages, with formal links to entropy maximization. Our method significantly improves both the efficiency and compositionality of the learned messages on the Shapes3D and MPI3D datasets, surpassing prior discrete communication frameworks in both reconstruction accuracy and topographic similarity. This work provides new theoretical and empirical evidence for the emergence of structured, generalizable communication protocols from simplicity-based inductive biases.

A Compressive-Expressive Communication Framework for Compositional Representations

TL;DR

CELEBI tackles the emergence of compositional representations in emergent communication by coupling three inductive biases—Progressive Decoding, Final-State Imitation, and Pairwise Distance Maximization—within an iterated learning framework. The discrete bottleneck and VAE backbone enable a reconstruction-based setup where messages encode generating factors in a structured way. Empirically, CELEBI yields higher Topographic Similarity and shorter useful message length on Shapes3D and MPI3D and improves disentanglement metrics relative to baselines, supporting the claim that simple, pressure-driven biases can foster compositional, transferable communication protocols. Overall, the work provides both theoretical and empirical evidence that expressivity-compressibility tensions, when guided by principled regularizers, can drive the emergence of generalizable, compositional communication.

Abstract

Compositionality in knowledge and language--the ability to represent complex concepts as a combination of simpler ones--is a hallmark of human cognition and communication. Despite recent advances, deep neural networks still struggle to acquire this property reliably. Neural models for emergent communication look to endow artificial agents with compositional language by simulating the pressures that form human language. In this work, we introduce CELEBI (Compressive-Expressive Language Emergence through a discrete Bottleneck and Iterated learning), a novel self-supervised framework for inducing compositional representations through a reconstruction-based communication game between a sender and a receiver. Building on theories of language emergence and the iterated learning framework, we integrate three mechanisms that jointly promote compressibility, expressivity, and efficiency in the emergent language. First, Progressive Decoding incentivizes intermediate reasoning by requiring the receiver to produce partial reconstructions after each symbol. Second, Final-State Imitation trains successive generations of agents to imitate reconstructions rather than messages, enforcing a tighter communication bottleneck. Third, Pairwise Distance Maximization regularizes message diversity by encouraging high distances between messages, with formal links to entropy maximization. Our method significantly improves both the efficiency and compositionality of the learned messages on the Shapes3D and MPI3D datasets, surpassing prior discrete communication frameworks in both reconstruction accuracy and topographic similarity. This work provides new theoretical and empirical evidence for the emergence of structured, generalizable communication protocols from simplicity-based inductive biases.

Paper Structure

This paper contains 49 sections, 6 theorems, 32 equations, 7 figures, 5 tables.

Key Result

Theorem D.1

Assume that $\mathcal{G}=[1:N]^n$ and that $\mathsf{GenX}:\mathcal{G}\to \mathcal{X}$ is an injective function, and that $\mathcal{D}_{train}\subseteq \mathcal{D}$ has cardinality $|\mathcal{D}_{train}|=p|\mathcal{D}|$ for some $p\in(0,1)$ such that $pN^n$ is an integer. We assume that the compositi Then as $|\mathcal{G}|\to\infty$ the probability that $\mathcal{D}_{train}$ is sufficient to recons

Figures (7)

  • Figure 1: (a) Overview of our proposed architecture.Interaction Phase: The sender $S_{\phi^t}$ and receiver $R_{\omega^t}$ are jointly trained to minimize the reconstruction error between the input state $x$ and the predicted states $\{x'\}$, by encoding $x$ into a message $m^t_x$. Imitation Phase: A new sender $S_{\phi^{t+1}}$ is trained to imitate the final predicted output of $R_{\omega^t}(m^t_x)$, while also maximizing pairwise message diversity to encourage exploration. (b) Qualitative example of image decoding: the receiver reconstructs the input image from the sender's message for each sub-message, giving a series of reconstructions. The useful length of a message corresponds to the first reconstruction that has error below a threshold $\epsilon>0$: in this example the last two message tokens are not useful in that they do not add to the reconstruction accuracy.
  • Figure 2: Effect of geometric penalty $\lambda$ on emergent communication. We introduce a geometric weighting term $\lambda$ that penalizes reconstruction error more strongly for longer messages. Increasing $\lambda$ initially improves useful length (a). However, once a certain threshold is met, useful length and reconstruction error (b) increase. Values close to 1.5 of $\lambda$ strike a favorable balance—achieving efficient and expressive communication.
  • Figure 3: Loss distribution over positions and $\lambda$ values for (a) Shapes3D and (b) MPI3D
  • Figure 4: Useful length values for different $\epsilon$ and $\lambda$ values for (a) Shapes3D and (b) MPI3D
  • Figure 5: Progressive image reconstructions obtained at each decoding step by the receiver as it processes successive symbols of the message.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem D.1
  • Proposition D.2
  • proof
  • Theorem D.3
  • proof
  • Remark D.4: Extension of Prop. \ref{['prop:dtraingood']} and Theorems \ref{['thm:dtraingood']} and \ref{['thm:recovercomp']} to general $\mathcal{G}$
  • Proposition E.1
  • proof
  • Definition F.1
  • Definition F.2
  • ...and 2 more