Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson
TL;DR
DDRQN tackles the problem of learning communication protocols among multiple agents operating under partial observability. It introduces three architectural innovations—last-action inputs, inter-agent weight sharing, and disabling experience replay—to enable centralized learning of decentralized policies, producing a shared Q-function over private histories and agent IDs. Empirical results on Hats and Switch riddles show DDRQN learns effective coordination and emergent communication, outperforming baselines and revealing interpretable strategies; ablation studies confirm each component's critical role. The work demonstrates for the first time that deep reinforcement learning can autonomously discover communication protocols in multi-agent settings, with implications for scalable coordination in real-world, partially observable domains.
Abstract
We propose deep distributed recurrent Q-networks (DDRQN), which enable teams of agents to learn to solve communication-based coordination tasks. In these tasks, the agents are not given any pre-designed communication protocol. Therefore, in order to successfully communicate, they must first automatically develop and agree upon their own communication protocol. We present empirical results on two multi-agent learning problems based on well-known riddles, demonstrating that DDRQN can successfully solve such tasks and discover elegant communication protocols to do so. To our knowledge, this is the first time deep reinforcement learning has succeeded in learning communication protocols. In addition, we present ablation experiments that confirm that each of the main components of the DDRQN architecture are critical to its success.
