Coarse-to-Fine 3D Keyframe Transporter

Xupeng Zhu; David Klee; Dian Wang; Boce Hu; Haojie Huang; Arsh Tangri; Robin Walters; Robert Platt

Coarse-to-Fine 3D Keyframe Transporter

Xupeng Zhu, David Klee, Dian Wang, Boce Hu, Haojie Huang, Arsh Tangri, Robin Walters, Robert Platt

TL;DR

The paper introduces a Coarse-to-Fine 3D Keyframe Transporter that exploits bi-equivariant symmetry in Keyframe Imitation Learning to efficiently learn SE(3) actions for manipulation. By replacing 2D cross-correlation with a 3D cross-correlation framework and implementing a SE(3) coarse-to-fine action evaluator, it achieves strong sample efficiency and broad task coverage, including push, turn, and tool use. Key contributions include bi-equivariant policy formulation, 3D cross-correlation-based action inference, in-hand segmentation, and a multi-level C2F scheme that significantly reduces computation. Empirical results on RLBench and real-world tasks demonstrate substantial performance gains with limited demonstrations, highlighting the method’s practical impact for data-efficient robotic manipulation.

Abstract

Recent advances in Keyframe Imitation Learning (IL) have enabled learning-based agents to solve a diverse range of manipulation tasks. However, most approaches ignore the rich symmetries in the problem setting and, as a consequence, are sample-inefficient. This work identifies and utilizes the bi-equivariant symmetry within Keyframe IL to design a policy that generalizes to transformations of both the workspace and the objects grasped by the gripper. We make two main contributions: First, we analyze the bi-equivariance properties of the keyframe action scheme and propose a Keyframe Transporter derived from the Transporter Networks, which evaluates actions using cross-correlation between the features of the grasped object and the features of the scene. Second, we propose a computationally efficient coarse-to-fine SE(3) action evaluation scheme for reasoning the intertwined translation and rotation action. The resulting method outperforms strong Keyframe IL baselines by an average of >10% on a wide range of simulation tasks, and by an average of 55% in 4 physical experiments.

Coarse-to-Fine 3D Keyframe Transporter

TL;DR

Abstract

Coarse-to-Fine 3D Keyframe Transporter

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)