One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors
Justin Fu, Sergey Levine, Pieter Abbeel
TL;DR
The paper tackles the challenge of data-efficient one-shot learning for robotic manipulation by marrying a neural-network dynamics prior learned from diverse tasks with online adaptation of a local linear dynamics model. Planning is performed with model predictive control powered by iterative LQR, allowing rapid correction of unmodeled variation as new tasks are attempted. Key contributions include a Bayesian framework for online dynamics fitting with priors, and a thorough evaluation showing that neural priors with online adaptation enable successful one-shot learning on contact-rich manipulation tasks. The approach demonstrates strong performance on a real PR2 and in simulation, highlighting practical impact for fast, data-efficient robotic skill acquisition and potential for multi-task prior sharing.
Abstract
One of the key challenges in applying reinforcement learning to complex robotic control tasks is the need to gather large amounts of experience in order to find an effective policy for the task at hand. Model-based reinforcement learning can achieve good sample efficiency, but requires the ability to learn a model of the dynamics that is good enough to learn an effective policy. In this work, we develop a model-based reinforcement learning algorithm that combines prior knowledge from previous tasks with online adaptation of the dynamics model. These two ingredients enable highly sample-efficient learning even in regimes where estimating the true dynamics is very difficult, since the online model adaptation allows the method to locally compensate for unmodeled variation in the dynamics. We encode the prior experience into a neural network dynamics model, adapt it online by progressively refitting a local linear model of the dynamics, and use model predictive control to plan under these dynamics. Our experimental results show that this approach can be used to solve a variety of complex robotic manipulation tasks in just a single attempt, using prior data from other manipulation behaviors.
