Table of Contents
Fetching ...

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Chelsea Finn, Sergey Levine, Pieter Abbeel

TL;DR

This paper tackles learning from demonstrations when the system dynamics are unknown and the task cost is hard to specify. It introduces Guided Cost Learning (GCL), a framework that jointly learns nonlinear cost functions (via neural networks) and a policy by interleaving IOC updates with policy optimization to adaptively sample informative trajectories. Key contributions include a nonlinear, sample-based MaxEnt IOC objective with importance weighting, adaptive sampling using time-varying linear models, and regularization strategies to prevent overfitting, demonstrated on simulated tasks and real robotic manipulation with torque control and vision. The results show improved task complexity handling and sample efficiency over prior IOC methods, enabling practical learning-from-demonstrations for real-world robotic systems.

Abstract

Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

TL;DR

This paper tackles learning from demonstrations when the system dynamics are unknown and the task cost is hard to specify. It introduces Guided Cost Learning (GCL), a framework that jointly learns nonlinear cost functions (via neural networks) and a policy by interleaving IOC updates with policy optimization to adaptively sample informative trajectories. Key contributions include a nonlinear, sample-based MaxEnt IOC objective with importance weighting, adaptive sampling using time-varying linear models, and regularization strategies to prevent overfitting, demonstrated on simulated tasks and real robotic manipulation with torque control and vision. The results show improved task complexity handling and sample efficiency over prior IOC methods, enabling practical learning-from-demonstrations for real-world robotic systems.

Abstract

Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.

Paper Structure

This paper contains 23 sections, 9 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: Right: Guided cost learning uses policy optimization to adaptively sample trajectories for estimating the IOC partition function. Bottom left: PR2 learning to gently place a dish in a plate rack.
  • Figure 2: Comparison to prior work on simulated 2D navigation, reaching, and peg insertion tasks. Reported performance is averaged over 4 runs of IOC on 4 different initial conditions . For peg insertion, the depth of the hole is 0.1m, marked as a dashed line. Distances larger than this amount failed to insert the peg.
  • Figure 3: Dish placement and pouring tasks. The robot learned to place the plate gently into the correct slot, and to pour almonds, localizing the target cup using unsupervised visual features. A video of the learned controllers can be found at http://rll.berkeley.edu/gcl
  • Figure 4: KL divergence between trajectories produced by our method, and various ablations, to the true distribution. Guided cost learning recovers trajectories that come close to both the mean and variance of the true distribution using 40 demonstrated trajectories, whereas the algorithm without MaxEnt policy optimization or without importance weights recovers the mean but not the variance.
  • Figure 5: Comparison showing ablations of our method with leaving out one of the two regularization terms. The monotonic regularization improves performance in three of the four task settings, and the local constant rate regularization significantly improves performance in all settings. Reported distance is averaged over four runs of IOC on four different initial conditions.