Table of Contents
Fetching ...

Relational Deep Reinforcement Learning

Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia

TL;DR

The paper addresses inefficiencies and poor generalization in deep reinforcement learning by introducing relational inductive biases: entities and their relations are represented and reasoned about via self-attention in a neural architecture. This approach enables non-local, iterative relational reasoning that guides a model-free policy, yielding interpretable internal computations. Empirically, the method achieves near-optimal performance on Box-World and state-of-the-art results on StarCraft II mini-games, with strong generalization and notable zero-shot transfer. The work demonstrates that combining relational learning principles with deep learning can overcome stubborn challenges in RL and suggests directions for future structured perception and planning in AI systems.

Abstract

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.

Relational Deep Reinforcement Learning

TL;DR

The paper addresses inefficiencies and poor generalization in deep reinforcement learning by introducing relational inductive biases: entities and their relations are represented and reasoned about via self-attention in a neural architecture. This approach enables non-local, iterative relational reasoning that guides a model-free policy, yielding interpretable internal computations. Empirically, the method achieves near-optimal performance on Box-World and state-of-the-art results on StarCraft II mini-games, with strong generalization and notable zero-shot transfer. The work demonstrates that combining relational learning principles with deep learning can overcome stubborn challenges in RL and suggests directions for future structured perception and planning in AI systems.

Abstract

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.

Paper Structure

This paper contains 9 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Box-World and StarCraft II tasks demand reasoning about entities and their relations.
  • Figure 2: Box-World agent architecture and multi-head dot-product attention. $E$ is a matrix that compiles the entities produced by the visual front-end; $f_{\theta}$ is a multilayer perceptron applied in parallel to each row of the output of an MHDPA step, $A$, and producing updated entities, $\widetilde{E}$.
  • Figure 3: Box-World task: example observations (left), underlying graph structure that determines the proper path to the goal and any distractor branches (middle) and training curves (right).
  • Figure 4: Visualization of attention weights. (a) The underlying graph of one example level; (b) the result of the analysis for that level, using each of the entities along the solution path (1--5) as the source of attention. Arrows point to the entities that the source is attending to. An arrow's transparency is determined by the corresponding attention weight.
  • Figure 5: Generalization in Box-World. Zero-shot transfer to levels that required: (a) opening a longer sequence of boxes; (b) using a key-lock combination that was never required during training.
  • ...and 2 more figures