Table of Contents
Fetching ...

Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer

Coline Devin, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Sergey Levine

TL;DR

<3-5 sentence high-level summary> The paper tackles data inefficiency in deep RL for robotic skills by proposing modular policy networks that partition policy representations into task-specific and robot-specific modules. This decomposition enables cross-robot and cross-task transfer, including zero-shot generalization to unseen robot-task combinations, by recombining pretrained modules. Trained under a GPS framework across multiple simulated robots and tasks, the approach demonstrates zero-shot transfer in visuomotor and manipulation tasks and accelerates learning for held-out combinations. The work highlights regularization to enforce interface invariance and discusses future directions toward lifelong, scalable, multi-robot transfer with larger repertoires.

Abstract

Reinforcement learning (RL) can automate a wide variety of robotic skills, but learning each new skill requires considerable real-world data collection and manual representation engineering to design policy classes or features. Using deep reinforcement learning to train general purpose neural network policies alleviates some of the burden of manual representation engineering by using expressive policy classes, but exacerbates the challenge of data collection, since such methods tend to be less efficient than RL with low-dimensional, hand-designed representations. Transfer learning can mitigate this problem by enabling us to transfer information from one skill to another and even from one robot to another. We show that neural network policies can be decomposed into "task-specific" and "robot-specific" modules, where the task-specific modules are shared across robots, and the robot-specific modules are shared across all tasks on that robot. This allows for sharing task information, such as perception, between robots and sharing robot information, such as dynamics and kinematics, between tasks. We exploit this decomposition to train mix-and-match modules that can solve new robot-task combinations that were not seen during training. Using a novel neural network architecture, we demonstrate the effectiveness of our transfer method for enabling zero-shot generalization with a variety of robots and tasks in simulation for both visual and non-visual tasks.

Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer

TL;DR

<3-5 sentence high-level summary> The paper tackles data inefficiency in deep RL for robotic skills by proposing modular policy networks that partition policy representations into task-specific and robot-specific modules. This decomposition enables cross-robot and cross-task transfer, including zero-shot generalization to unseen robot-task combinations, by recombining pretrained modules. Trained under a GPS framework across multiple simulated robots and tasks, the approach demonstrates zero-shot transfer in visuomotor and manipulation tasks and accelerates learning for held-out combinations. The work highlights regularization to enforce interface invariance and discusses future directions toward lifelong, scalable, multi-robot transfer with larger repertoires.

Abstract

Reinforcement learning (RL) can automate a wide variety of robotic skills, but learning each new skill requires considerable real-world data collection and manual representation engineering to design policy classes or features. Using deep reinforcement learning to train general purpose neural network policies alleviates some of the burden of manual representation engineering by using expressive policy classes, but exacerbates the challenge of data collection, since such methods tend to be less efficient than RL with low-dimensional, hand-designed representations. Transfer learning can mitigate this problem by enabling us to transfer information from one skill to another and even from one robot to another. We show that neural network policies can be decomposed into "task-specific" and "robot-specific" modules, where the task-specific modules are shared across robots, and the robot-specific modules are shared across all tasks on that robot. This allows for sharing task information, such as perception, between robots and sharing robot information, such as dynamics and kinematics, between tasks. We exploit this decomposition to train mix-and-match modules that can solve new robot-task combinations that were not seen during training. Using a novel neural network architecture, we demonstrate the effectiveness of our transfer method for enabling zero-shot generalization with a variety of robots and tasks in simulation for both visual and non-visual tasks.

Paper Structure

This paper contains 14 sections, 6 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The 3DoF and a 4DoF robot which specify one degree of variation (robots) in the universe described in Section \ref{['sec:modpolnet']} as well as the tasks of opening a drawer and pushing a block which specify the other degree of variation (tasks) in the universe.
  • Figure 2: The possible worlds enumerated for all combinations of tasks and robots for the universe described in Section \ref{['sec:modpolnet']}
  • Figure 3: Modular policy composition for a universe with 2 tasks and 2 robots. There are 4 available modules - 2 task modules and 2 robot modules, and each module is a nerual network. For the training worlds, these modules are composed together to form the individual policy networks. Modules of the same color share their weights. Policy networks for the same task share task modules and those for the same robot share robot modules. The training worlds are composed and then trained end-to-end. On encountering an unseen world, the appropriate previously trained modules are composed to give a policy capable of good zero-shot performance
  • Figure 4: Basic visuomotor policy network for a single robot. The two convolutional layers and spatial softmax form the task module, while the last few fully connected layers form the robot module
  • Figure 5: Grid of tasks vs robots for the reaching colored blocks task in simulation described in \ref{['sec:colorblocks']}. We train on all the worlds besides the 4link robot reaching to black, and test on this held out world.
  • ...and 4 more figures