E2R: a Hierarchical-Learning inspired Novelty-Search method to generate diverse repertoires of grasping trajectories

Johann Huber; Oumar Sane; Alex Coninx; Faiz Ben Amar; Stephane Doncieux

E2R: a Hierarchical-Learning inspired Novelty-Search method to generate diverse repertoires of grasping trajectories

Johann Huber, Oumar Sane, Alex Coninx, Faiz Ben Amar, Stephane Doncieux

TL;DR

This work introduces a new NS-based method that can generate large datasets of grasping trajectories in a platform-agnostic manner and inspired by the hierarchical learning paradigm, decouples approach and prehension to make the behavioral space smoother.

Abstract

Robotics grasping refers to the task of making a robotic system pick an object by applying forces and torques on its surface. Despite the recent advances in data-driven approaches, grasping remains an unsolved problem. Most of the works on this task are relying on priors and heavy constraints to avoid the exploration problem. Novelty Search (NS) refers to evolutionary algorithms that replace selection of best performing individuals with selection of the most novel ones. Such methods have already shown promising results on hard exploration problems. In this work, we introduce a new NS-based method that can generate large datasets of grasping trajectories in a platform-agnostic manner. Inspired by the hierarchical learning paradigm, our method decouples approach and prehension to make the behavioral space smoother. Experiments conducted on 3 different robot-gripper setups and on several standard objects shows that our method outperforms state-of-the-art for generating diverse repertoire of grasping trajectories, getting a higher successful run ratio, as well as a better diversity for both approach and prehension. Some of the generated solutions have been successfully deployed on a real robot, showing the exploitability of the obtained repertoires.

E2R: a Hierarchical-Learning inspired Novelty-Search method to generate diverse repertoires of grasping trajectories

TL;DR

Abstract

Paper Structure (16 sections, 3 equations, 6 figures, 1 algorithm)

This paper contains 16 sections, 3 equations, 6 figures, 1 algorithm.

INTRODUCTION
RELATED WORKS
Learning to grasp
Generating repertoire of grasping policies
Leveraging diversity for robotics learning
METHOD
Novelty Search
Intuition
How to divide the task without actually doing it
Algorithm
EXPERIMENTAL SETUP
Environments
Algorithms
Metrics
RESULTS AND DISCUSSION
...and 1 more sections

Figures (6)

Figure 1: Diversity of grasping trajectories produced by E2R versus state-of-the-art morel2022automatic. Each trajectory is plotted as a succession of end-effector positions in the Cartesian space. Color expresses temporality, from purple to yellow. Trajectories are plotted until the gripper touches the object. On each plot are drawn 250 randomly sampled trajectories from an output repertoire, obtained from runs on 3 different robots-gripper-object setups. While NSMBS's diversity is limited to local search around the first found trajectory, solutions generated by E2R are spread in the whole operational space.
Figure 2: E2R overall principle. At first, we attempt to find a first successful grasp through a mutation-selection principle applied on a population of open-loop trajectories, favoring the most novel solutions lehman2011abandoning. To find diverse policies, we then decompose the task into an approach ( explore) and a prehension ( refine) subtasks: mutation randomly focus on one of them, while selection is driven by behavior descriptors that distinguish diversity of approach ($b_2$, $b_3$) from diversity of prehension ($b_4$, $b_5$). The novelty archive is used as a long term memory of already discovered behaviors, pushing the search toward diversity on each descriptor (MultiBCSel morel2022automatic). The population is frequenlty refilled with the most novel successful solutions on both tasks ( regenerate). Output is a large and diverse repertoire of successful open-loop grasping policies.
Figure 3: Success ratio and size of the output repertoire of grasping trajectories on evaluated methods. Averaged over 5 seeds on all the evaluated objects. Error bars are $0.95$ confidence interval. E2R outperforms all other methods in success rate on the evaluated setups. The number of generated solutions make E2R and NSMBS the two most promising methods to generate diverse grasping trajectories.
Figure 4: Evolution of success repertoire's diversity throughout the evolutionary process. Averaged over 5 seeds on all the evaluated objects. The bands express the standard deviation. E2R generates more diverse trajectories on the evaluated setups, regarding both approach and prehension ($p < 10^{-3}$).
Figure 5: Sim2real transferability rate of randomly sampled trajectories from success repertoires on a real Baxter robot. Averaged on 3 seeds. Error bars are $0.95$ confidence interval. The large diversity of E2R's generated solutions does not compromise their exploitability on a real robot.
...and 1 more figures

E2R: a Hierarchical-Learning inspired Novelty-Search method to generate diverse repertoires of grasping trajectories

TL;DR

Abstract

E2R: a Hierarchical-Learning inspired Novelty-Search method to generate diverse repertoires of grasping trajectories

Authors

TL;DR

Abstract

Table of Contents

Figures (6)