Table of Contents
Fetching ...

One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors

Justin Fu, Sergey Levine, Pieter Abbeel

TL;DR

The paper tackles the challenge of data-efficient one-shot learning for robotic manipulation by marrying a neural-network dynamics prior learned from diverse tasks with online adaptation of a local linear dynamics model. Planning is performed with model predictive control powered by iterative LQR, allowing rapid correction of unmodeled variation as new tasks are attempted. Key contributions include a Bayesian framework for online dynamics fitting with priors, and a thorough evaluation showing that neural priors with online adaptation enable successful one-shot learning on contact-rich manipulation tasks. The approach demonstrates strong performance on a real PR2 and in simulation, highlighting practical impact for fast, data-efficient robotic skill acquisition and potential for multi-task prior sharing.

Abstract

One of the key challenges in applying reinforcement learning to complex robotic control tasks is the need to gather large amounts of experience in order to find an effective policy for the task at hand. Model-based reinforcement learning can achieve good sample efficiency, but requires the ability to learn a model of the dynamics that is good enough to learn an effective policy. In this work, we develop a model-based reinforcement learning algorithm that combines prior knowledge from previous tasks with online adaptation of the dynamics model. These two ingredients enable highly sample-efficient learning even in regimes where estimating the true dynamics is very difficult, since the online model adaptation allows the method to locally compensate for unmodeled variation in the dynamics. We encode the prior experience into a neural network dynamics model, adapt it online by progressively refitting a local linear model of the dynamics, and use model predictive control to plan under these dynamics. Our experimental results show that this approach can be used to solve a variety of complex robotic manipulation tasks in just a single attempt, using prior data from other manipulation behaviors.

One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors

TL;DR

The paper tackles the challenge of data-efficient one-shot learning for robotic manipulation by marrying a neural-network dynamics prior learned from diverse tasks with online adaptation of a local linear dynamics model. Planning is performed with model predictive control powered by iterative LQR, allowing rapid correction of unmodeled variation as new tasks are attempted. Key contributions include a Bayesian framework for online dynamics fitting with priors, and a thorough evaluation showing that neural priors with online adaptation enable successful one-shot learning on contact-rich manipulation tasks. The approach demonstrates strong performance on a real PR2 and in simulation, highlighting practical impact for fast, data-efficient robotic skill acquisition and potential for multi-task prior sharing.

Abstract

One of the key challenges in applying reinforcement learning to complex robotic control tasks is the need to gather large amounts of experience in order to find an effective policy for the task at hand. Model-based reinforcement learning can achieve good sample efficiency, but requires the ability to learn a model of the dynamics that is good enough to learn an effective policy. In this work, we develop a model-based reinforcement learning algorithm that combines prior knowledge from previous tasks with online adaptation of the dynamics model. These two ingredients enable highly sample-efficient learning even in regimes where estimating the true dynamics is very difficult, since the online model adaptation allows the method to locally compensate for unmodeled variation in the dynamics. We encode the prior experience into a neural network dynamics model, adapt it online by progressively refitting a local linear model of the dynamics, and use model predictive control to plan under these dynamics. Our experimental results show that this approach can be used to solve a variety of complex robotic manipulation tasks in just a single attempt, using prior data from other manipulation behaviors.

Paper Structure

This paper contains 18 sections, 9 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Diagram of our method: the robot uses prior experience from other tasks (a) to fit a neural network model of the dynamics of object interaction tasks (b). When faced with a new task (c), our algorithm learns a new model online during task execution, using the neural network as a prior. This new model is used to plan actions, allowing for one-shot learning of new skills.
  • Figure 2: Diagram of the neural network architectures used in our experiments. We found that using a short temporal context as input, as shown in network (2), improved the results for manipulation tasks that involved contact dynamics. Both networks produce accelerations which are used to predict the next state.
  • Figure 3: Simulated tasks used for evaluation: (a) Cylindrical peg insertion (b) Cross-shaped peg insertion (c) Stacking over a cylindrical peg (d) Stacking over a square-shaped peg
  • Figure 4: Tasks used in our evaluation: (a) inserting a toy nail into a toolbench, (b) inserting the nail with a high-friction surface to increase difficulty, (c) placing a wooden ring on a tight-fitting peg, (d) stacking toy blocks, (e) putting together part of a gear assembly, (f) assembling a toy airplane and (g) a toy car.