Table of Contents
Fetching ...

SLIM: Skill Learning with Multiple Critics

David Emukpere, Bingbing Wu, Julien Perez, Jean-Michel Renders

TL;DR

SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation, is introduced, demonstrating the applicability of the novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, significantly surpassing baseline approaches for skill discovery.

Abstract

Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful and safe manipulation behaviors. Furthermore, tackling this by augmenting skill discovery rewards with additional rewards through a naive combination might fail to produce desired behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, significantly surpassing baseline approaches for skill discovery.

SLIM: Skill Learning with Multiple Critics

TL;DR

SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation, is introduced, demonstrating the applicability of the novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, significantly surpassing baseline approaches for skill discovery.

Abstract

Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful and safe manipulation behaviors. Furthermore, tackling this by augmenting skill discovery rewards with additional rewards through a naive combination might fail to produce desired behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, significantly surpassing baseline approaches for skill discovery.
Paper Structure (11 sections, 6 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 6 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Skill Learning wIth Multiple critics. Our approach enables the effective combination of multiple objectives for self-supervised skill discovery in robotic manipulation. We learn dedicated critics per intrinsic reward function which is used during policy improvement by taking a weighted combination of their normalized advantages. (a) Schematic diagram (b) Simulation top view (c) Simulation side view.
  • Figure 2: Skill trajectories for SLIM, SLIM ablations, and baselines. SLIM outperforms baselines in terms of grasping consistency and the diversity of the cube's displacement. The baselines do not learn to pick up the cube. While SLIM ablations show different levels of object interaction with both picking and pushing behaviors emerging, only SLIM learns interactive, diverse and safe displacement manipulations
  • Figure 3: Coverage and Safety. Coverage is the number of boxes discretizing the workspace covered by the object. Safety is the ratio of safe states encountered during random skill rollouts.
  • Figure 4: Performance on downstream tasks. We evaluate our approach with the position-matching and orientation-matching tasks. SLIM enables improved sample efficiency across all downstream tasks
  • Figure 5: Safe trajectory following. We evaluate our HRL policies trained over SLIM as motor primitives for safe trajectory following. The six trajectories evaluated are shown in order from top left to bottom right