Concept-Best-Matching: Evaluating Compositionality in Emergent Communication
Boaz Carmeli, Yonatan Belinkov, Ron Meir
TL;DR
This work tackles the challenge of evaluating compositionality in emergent communication by introducing Concept Best Matching (CBM), which constructs a weighted bipartite graph between emergent EC words and natural-language concepts and finds the optimal one-to-one mapping via the Hungarian algorithm. The resulting CBM score, normalized by $Q=\sum_{i\in D}\max(|m_i|,|l_i|)$, yields a global measure of compositionality and an interpretable translation between words and concepts. Experiments on Shape and Thing datasets with GS and QT communication show that CBM aligns with task accuracy and exposes sub-phenomena like ambiguities and paraphrases, offering more fine-grained insights than traditional TopSim or AMI metrics. The results suggest that, while QT tends to perform better than GS, none of the setups achieve the level of compositionality seen in natural language, highlighting the gap between emergent protocols and human-like symbolic language. CBM provides a practical, interpretable diagnostic tool for analyzing and steering the development of EC systems toward more compositional and human-aligned communication.
Abstract
Artificial agents that learn to communicate in order to accomplish a given task acquire communication protocols that are typically opaque to a human. A large body of work has attempted to evaluate the emergent communication via various evaluation measures, with \emph{compositionality} featuring as a prominent desired trait. However, current evaluation procedures do not directly expose the compositionality of the emergent communication. We propose a procedure to assess the compositionality of emergent communication by finding the best-match between emerged words and natural language concepts. The best-match algorithm provides both a global score and a translation-map from emergent words to natural language concepts. To the best of our knowledge, it is the first time that such direct and interpretable mapping between emergent words and human concepts is provided.
