Table of Contents
Fetching ...

Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation

Paul Jansonnie, Bingbing Wu, Julien Perez, Jan Peters

TL;DR

This work proposes a novel Skill Learning approach that discovers composable behaviors by solving a large and diverse number of autonomously generated tasks and leverages Asymmetric Self-Play to discover behaviors and Multiplicative Compositional Policies to embed them.

Abstract

Learning skills that interact with objects is of major importance for robotic manipulation. These skills can indeed serve as an efficient prior for solving various manipulation tasks. We propose a novel Skill Learning approach that discovers composable behaviors by solving a large and diverse number of autonomously generated tasks. Our method learns skills allowing the robot to consistently and robustly interact with objects in its environment. The discovered behaviors are embedded in primitives which can be composed with Hierarchical Reinforcement Learning to solve unseen manipulation tasks. In particular, we leverage Asymmetric Self-Play to discover behaviors and Multiplicative Compositional Policies to embed them. We compare our method to Skill Learning baselines and find that our skills are more interactive. Furthermore, the learned skills can be used to solve a set of unseen manipulation tasks, in simulation as well as on a real robotic platform.

Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation

TL;DR

This work proposes a novel Skill Learning approach that discovers composable behaviors by solving a large and diverse number of autonomously generated tasks and leverages Asymmetric Self-Play to discover behaviors and Multiplicative Compositional Policies to embed them.

Abstract

Learning skills that interact with objects is of major importance for robotic manipulation. These skills can indeed serve as an efficient prior for solving various manipulation tasks. We propose a novel Skill Learning approach that discovers composable behaviors by solving a large and diverse number of autonomously generated tasks. Our method learns skills allowing the robot to consistently and robustly interact with objects in its environment. The discovered behaviors are embedded in primitives which can be composed with Hierarchical Reinforcement Learning to solve unseen manipulation tasks. In particular, we leverage Asymmetric Self-Play to discover behaviors and Multiplicative Compositional Policies to embed them. We compare our method to Skill Learning baselines and find that our skills are more interactive. Furthermore, the learned skills can be used to solve a set of unseen manipulation tasks, in simulation as well as on a real robotic platform.
Paper Structure (25 sections, 3 equations, 4 figures, 1 table)

This paper contains 25 sections, 3 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Method Overview: A task solver is pre-trained in simulation to solve tasks that are autonomously proposed by a task generator (left). The solver discovers diverse behaviors and embeds them in its primitives. On each downstream task, an orchestrator is trained to reuse the pre-trained primitives (middle). Downstream agents are transferred to a real robotic platform (right).
  • Figure 2: Downstream Tasks: The robot must move the solid red cube to a target position represented as a transparent red cube. The orange zone represents where the initial position of the cube can be sampled, while the green zone represents where the target position of the cube can be sampled.
  • Figure 3: Per-Skill Trajectories: Trajectories of object positions for $N_\text{skills}=\text{32}$ random compositions of skills. Each color corresponds to a random composition. Our primitives in \ref{['fig:coverage_ours']} increasingly learn to cover reachable positions of the object. The baselines partially cover the workspace (\ref{['fig:coverage_diayn']},\ref{['fig:coverage_lsd']}), push and throw the object to unreachable positions (\ref{['fig:coverage_lsd']}), or very rarely interact with the object (\ref{['fig:coverage_dads']}).
  • Figure 4: Downstream Learning Curves: Average success rate throughout training on a downstream task. The solid area represents the standard deviation. "Scratch" baselines are denoted with a "0" for compactness. Our method is overall the most competitive both in terms of sample efficiency and final performance.