Table of Contents
Fetching ...

Less is more -- the Dispatcher/ Executor principle for multi-task Reinforcement Learning

Martin Riedmiller, Andrea Gesmundo, Tim Hertweck, Roland Hafner

TL;DR

This work introduces the dispatcher/ executor principle for the design of multi-task Reinforcement Learning controllers, and suggests to partition the controller in two entities, one that understands the task and one that computes the controls for the specific device (the executor).

Abstract

Humans instinctively know how to neglect details when it comes to solve complex decision making problems in environments with unforeseeable variations. This abstraction process seems to be a vital property for most biological systems and helps to 'abstract away' unnecessary details and boost generalisation. In this work we introduce the dispatcher/ executor principle for the design of multi-task Reinforcement Learning controllers. It suggests to partition the controller in two entities, one that understands the task (the dispatcher) and one that computes the controls for the specific device (the executor) - and to connect these two by a strongly regularizing communication channel. The core rationale behind this position paper is that changes in structure and design principles can improve generalisation properties and drastically enforce data-efficiency. It is in some sense a 'yes, and ...' response to the current trend of using large neural networks trained on vast amounts of data and bet on emerging generalisation properties. While we agree on the power of scaling - in the sense of Sutton's 'bitter lesson' - we will give some evidence, that considering structure and adding design principles can be a valuable and critical component in particular when data is not abundant and infinite, but is a precious resource.

Less is more -- the Dispatcher/ Executor principle for multi-task Reinforcement Learning

TL;DR

This work introduces the dispatcher/ executor principle for the design of multi-task Reinforcement Learning controllers, and suggests to partition the controller in two entities, one that understands the task and one that computes the controls for the specific device (the executor).

Abstract

Humans instinctively know how to neglect details when it comes to solve complex decision making problems in environments with unforeseeable variations. This abstraction process seems to be a vital property for most biological systems and helps to 'abstract away' unnecessary details and boost generalisation. In this work we introduce the dispatcher/ executor principle for the design of multi-task Reinforcement Learning controllers. It suggests to partition the controller in two entities, one that understands the task (the dispatcher) and one that computes the controls for the specific device (the executor) - and to connect these two by a strongly regularizing communication channel. The core rationale behind this position paper is that changes in structure and design principles can improve generalisation properties and drastically enforce data-efficiency. It is in some sense a 'yes, and ...' response to the current trend of using large neural networks trained on vast amounts of data and bet on emerging generalisation properties. While we agree on the power of scaling - in the sense of Sutton's 'bitter lesson' - we will give some evidence, that considering structure and adding design principles can be a valuable and critical component in particular when data is not abundant and infinite, but is a precious resource.
Paper Structure (16 sections, 12 figures, 1 table)

This paper contains 16 sections, 12 figures, 1 table.

Figures (12)

  • Figure 1: Illustration of the D/E architecture. The controller consists of two modules, the dispatcher and the executor. The dispatcher gets the current observation and modifies it according to the task description. This abstract message is sent to the executor, which computes the according action for the device.
  • Figure 2: Simulation setup. From left to right: original setup with 3 objects, "Four cubes", "One cube", "Recolor". The controllers are trained to lift the red object (single task), the red, left or green object (multi-task), and evaluated also in non-training situations.
  • Figure 3: Single task training: 'lift red'. Both standard architecture and D/E architecture are trained for 20,000 episodes. Figure shows evaluation results for different tasks. The standard architecture solves the learned task only, whereas the D/E architecture can also lift green and blue cubes
  • Figure 4: Multi task training: 'lift red/ green/ blue'. After 20k episodes, the D/E controller masters all three tasks, whereas the standard controller still performs purely (orange). After 60k episodes using 3 times as much training data, the performance of the standard controller (green) increased significantly, but still does not reach the D/E results.
  • Figure 5: Multi task training: 'lift red/ green/ blue' with varying object shapes are used. D/E architecture with different representations performs well after 50k episodes. Standard architecture has low performance after 50k episodes and even after 150k episodes is worse than D/E.
  • ...and 7 more figures