Goal-conditioned Imitation Learning

Yiming Ding; Carlos Florensa; Mariano Phielipp; Pieter Abbeel

Goal-conditioned Imitation Learning

Yiming Ding, Carlos Florensa, Mariano Phielipp, Pieter Abbeel

TL;DR

The paper tackles reward design challenges in goal-directed robotics by marrying goal-conditioned imitation learning with hindsight relabeling. It introduces goalGAIL, an off-policy adversarial imitation framework conditioned on goals, and augments demonstrations with expert relabeling and the option to use state-only demonstrations. Across four continuous MuJoCo tasks, goalGAIL accelerates learning beyond HER and is robust to suboptimal experts, while expert relabeling and state-only demos further boost performance. The approach holds promise for data-efficient, reward-free policy learning in real-world robotics and can extend to vision-based settings in future work.

Abstract

Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute. The latter is particularly problematic when applying RL to robotics, where detecting whether the desired configuration is reached might require considerable supervision and instrumentation. Furthermore, we are often interested in being able to reach a wide range of configurations, hence setting up a different reward every time might be unpractical. Methods like Hindsight Experience Replay (HER) have recently shown promise to learn policies able to reach many goals, without the need of a reward. Unfortunately, without tricks like resetting to points along the trajectory, HER might require many samples to discover how to reach certain areas of the state-space. In this work we investigate different approaches to incorporate demonstrations to drastically speed up the convergence to a policy able to reach any goal, also surpassing the performance of an agent trained with other Imitation Learning algorithms. Furthermore, we show our method can also be used when the available expert trajectories do not contain the actions, which can leverage kinesthetic or third person demonstration. The code is available at https://sites.google.com/view/goalconditioned-il/.

Goal-conditioned Imitation Learning

TL;DR

Abstract

Goal-conditioned Imitation Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)