Table of Contents
Fetching ...

Fourier Transporter: Bi-Equivariant Robotic Manipulation in 3D

Haojie Huang, Owen Howell, Dian Wang, Xupeng Zhu, Robin Walters, Robert Platt

TL;DR

This work proposes Fourier Transporter (FourTran) which leverages the two-fold SE(d)xSE(d) symmetry in the pick-place problem to achieve much higher sample efficiency and utilizes a fiber space Fourier transformation that allows for memory-efficient construction.

Abstract

Many complex robotic manipulation tasks can be decomposed as a sequence of pick and place actions. Training a robotic agent to learn this sequence over many different starting conditions typically requires many iterations or demonstrations, especially in 3D environments. In this work, we propose Fourier Transporter (FourTran) which leverages the two-fold SE(d)xSE(d) symmetry in the pick-place problem to achieve much higher sample efficiency. FourTran is an open-loop behavior cloning method trained using expert demonstrations to predict pick-place actions on new environments. FourTran is constrained to incorporate symmetries of the pick and place actions independently. Our method utilizes a fiber space Fourier transformation that allows for memory-efficient construction. We test our proposed network on the RLbench benchmark and achieve state-of-the-art results across various tasks.

Fourier Transporter: Bi-Equivariant Robotic Manipulation in 3D

TL;DR

This work proposes Fourier Transporter (FourTran) which leverages the two-fold SE(d)xSE(d) symmetry in the pick-place problem to achieve much higher sample efficiency and utilizes a fiber space Fourier transformation that allows for memory-efficient construction.

Abstract

Many complex robotic manipulation tasks can be decomposed as a sequence of pick and place actions. Training a robotic agent to learn this sequence over many different starting conditions typically requires many iterations or demonstrations, especially in 3D environments. In this work, we propose Fourier Transporter (FourTran) which leverages the two-fold SE(d)xSE(d) symmetry in the pick-place problem to achieve much higher sample efficiency. FourTran is an open-loop behavior cloning method trained using expert demonstrations to predict pick-place actions on new environments. FourTran is constrained to incorporate symmetries of the pick and place actions independently. Our method utilizes a fiber space Fourier transformation that allows for memory-efficient construction. We test our proposed network on the RLbench benchmark and achieve state-of-the-art results across various tasks.
Paper Structure (33 sections, 4 theorems, 60 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 33 sections, 4 theorems, 60 equations, 5 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

Equation eqn:place-architecture satisfies the bi-equivariant symmetry stated in Equation eqn:place-symmetry if the following constraints hold:

Figures (5)

  • Figure 1: Illustration of bi-equivariance in 2D (left) and 3D (right). The place action, $a'=g_2ag_1^{-1}$, is symmetric with respect to both the orientation of the object to be picked, $g_1$, and the orientation of the place target, $g_2$.
  • Figure 2: Architecture of FourTran. $f_{\mathrm{pick}}$ first detects a task-appropriate pick pose. The crop $c$ centered at the pick location is fed to network $\psi$. The lift operation generates a stack of rotated features and Fourier transformation $\mathcal{F}^{+}$ is applied to the channel space of the feature to output the dynamic kernel $\kappa(c)$. The cross correlation is conducted in Fourier space.
  • Figure 3: 3D pick and place tasks. From left to right the tasks are: Stack-blocks, Stack-Cups, Stack-Wine, Place-Cups, and Put-Plate. The top row shows the initial scene and the bottom row shows the completion state.
  • Figure 4: 2D pick and place task descriptions. Left: Block-insertion task. Center: Assembling kits task. Right: Sweeping-piles task.
  • Figure 5: Visualization of expert $\mathrm{SO}(3)$ actions from 10 demonstrations. First column: expert pick action. Second column: expert place action. First row: stack-wine. Second row: put-plate. The orientation visualization follows "YXY" convention. For more detail on plot formatting, please see Murphy_2022_implicitpdf

Theorems & Definitions (8)

  • Proposition 1
  • Proposition 2
  • Lemma 1
  • proof 1
  • Lemma 2
  • proof 2
  • proof 3
  • proof 4