Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

Emilio Jorge; Mikael Kågebäck; Fredrik D. Johansson; Emil Gustavsson

Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

Emilio Jorge, Mikael Kågebäck, Fredrik D. Johansson, Emil Gustavsson

TL;DR

This work investigates emergent grounded language by training two agents to play a collaborative Guess Who? game using Deep Recurrent Q-Networks and differentiable inter-agent learning. The approach enables end-to-end, parameter-isolated agents to develop discrete, grounded vocabulary and multi-step dialogue that references visual concepts, with a noise-curriculum promoting robust language grounding. Extensive experiments on Guess Who? and CelebA show that larger vocabularies and memory-enabled interaction improve performance and that the learned language aligns with visual attributes while supporting context-dependent meaning. The findings highlight the feasibility of emergent grounded language in interactive, visually grounded environments and demonstrate scalable, interpretable communication without pre-defined protocols.

Abstract

Acquiring your first language is an incredible feat and not easily duplicated. Learning to communicate using nothing but a few pictureless books, a corpus, would likely be impossible even for humans. Nevertheless, this is the dominating approach in most natural language processing today. As an alternative, we propose the use of situated interactions between agents as a driving force for communication, and the framework of Deep Recurrent Q-Networks for evolving a shared language grounded in the provided environment. We task the agents with interactive image search in the form of the game Guess Who?. The images from the game provide a non trivial environment for the agents to discuss and a natural grounding for the concepts they decide to encode in their communication. Our experiments show that the agents learn not only to encode physical concepts in their words, i.e. grounding, but also that the agents learn to hold a multi-step dialogue remembering the state of the dialogue from step to step.

Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

TL;DR

Abstract

Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)