Table of Contents
Fetching ...

Robust Manipulation Primitive Learning via Domain Contraction

Teng Xue, Amirreza Razmjoo, Suhan Shetty, Sylvain Calinon

TL;DR

This work proposes a bi-level approach to learn robust manipulation primitives, including parameter-augmented policy learning using multiple models, and parameter-conditioned policy retrieval through domain contraction, which unifies domain randomization and domain adaptation.

Abstract

Contact-rich manipulation plays an important role in human daily activities, but uncertain parameters pose significant challenges for robots to achieve comparable performance through planning and control. To address this issue, domain adaptation and domain randomization have been proposed for robust policy learning. However, they either lose the generalization ability across diverse instances or perform conservatively due to neglecting instance-specific information. In this paper, we propose a bi-level approach to learn robust manipulation primitives, including parameter-augmented policy learning using multiple models, and parameter-conditioned policy retrieval through domain contraction. This approach unifies domain randomization and domain adaptation, providing optimal behaviors while keeping generalization ability. We validate the proposed method on three contact-rich manipulation primitives: hitting, pushing, and reorientation. The experimental results showcase the superior performance of our approach in generating robust policies for instances with diverse physical parameters.

Robust Manipulation Primitive Learning via Domain Contraction

TL;DR

This work proposes a bi-level approach to learn robust manipulation primitives, including parameter-augmented policy learning using multiple models, and parameter-conditioned policy retrieval through domain contraction, which unifies domain randomization and domain adaptation.

Abstract

Contact-rich manipulation plays an important role in human daily activities, but uncertain parameters pose significant challenges for robots to achieve comparable performance through planning and control. To address this issue, domain adaptation and domain randomization have been proposed for robust policy learning. However, they either lose the generalization ability across diverse instances or perform conservatively due to neglecting instance-specific information. In this paper, we propose a bi-level approach to learn robust manipulation primitives, including parameter-augmented policy learning using multiple models, and parameter-conditioned policy retrieval through domain contraction. This approach unifies domain randomization and domain adaptation, providing optimal behaviors while keeping generalization ability. We validate the proposed method on three contact-rich manipulation primitives: hitting, pushing, and reorientation. The experimental results showcase the superior performance of our approach in generating robust policies for instances with diverse physical parameters.

Paper Structure

This paper contains 24 sections, 2 theorems, 29 equations, 5 figures, 4 tables.

Key Result

Theorem 1

Given domain parameters $\bm{\alpha}$ and the distribution $\bm{p}$, the parameter-conditioned policy can be retrieved from the weighted sum of parameter-specific advantage functions.

Figures (5)

  • Figure 1: Overview of the proposed bi-level approach. Left: Parameter-augmented policy training using multiple models. The state, action, and parameter variables are denoted in black, blue, and red colors, respectively. Right: Parameter-conditioned policy retrieval through domain contraction. The retrieved policies perform well in terms of both generalization and optimality given a diverse set of objects with different shapes, weights, and friction parameters.
  • Figure 2: Overview of domain contraction in TT format. Using TTPI, we can obtain the parameter-augmented advantage function in TT format. It includes separate 3rd-order cores for different dimensionality, such as parameter, state and action. In this figure, we demonstrate the advantage function for Hit primitive. Given the parameter distribution (either by human knowledge or by system identification), we can retrieve the parameter-conditioned policy by making product of parameter distributions and corresponding TT cores.
  • Figure 3: Comparison of final state error given different estimated parameter distributions
  • Figure 4: TT decomposition generalizes matrix decomposition techniques to higher-dimensional arrays. In TT format, an element in a tensor can be obtained by multiplying specific slices of the core tensors. The figure presents examples of second-order, third-order, and fourth-order tensors. Image adapted from shetty2016tensor.
  • Figure 5: Shape parametrization of a mustard bottle using basis functions.

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • proof