The Curious Case of Representational Alignment: Unravelling Visio-Linguistic Tasks in Emergent Communication
Tom Kouwenhoven, Max Peeperkorn, Bram van Dijk, Tessa Verhoef
TL;DR
This work investigates why emergent visio-linguistic communication in neural agents often fails to ground language in human-like concepts. It introduces representational alignment as a central factor, showing that inter-agent alignment rises while grounding to input features decays, and that topsim correlates with alignment rather than true compositional structure. The authors propose a differentiable alignment penalty, $L_{ extsc{rsa}}$, to mitigate drift without sacrificing communicative success, and find that higher topsim does not necessarily improve performance on strict compositional benchmarks like Winoground. The study emphasizes reporting Representational Similarity Analysis (RSA) alongside topsim to properly interpret emergent communication results and recommends targeted, strict evaluation datasets to assess visio-linguistic compositional reasoning.
Abstract
Natural language has the universal properties of being compositional and grounded in reality. The emergence of linguistic properties is often investigated through simulations of emergent communication in referential games. However, these experiments have yielded mixed results compared to similar experiments addressing linguistic properties of human language. Here we address representational alignment as a potential contributing factor to these results. Specifically, we assess the representational alignment between agent image representations and between agent representations and input images. Doing so, we confirm that the emergent language does not appear to encode human-like conceptual visual features, since agent image representations drift away from inputs whilst inter-agent alignment increases. We moreover identify a strong relationship between inter-agent alignment and topographic similarity, a common metric for compositionality, and address its consequences. To address these issues, we introduce an alignment penalty that prevents representational drift but interestingly does not improve performance on a compositional discrimination task. Together, our findings emphasise the key role representational alignment plays in simulations of language emergence.
