Table of Contents
Fetching ...

Non-rigid Relative Placement through 3D Dense Diffusion

Eric Cai, Octavian Donca, Ben Eisner, David Held

TL;DR

This work proposes ``cross-displacement"- an extension of the principles of relative placement to geometric relationships between deformable objects - and presents a novel vision-based method to learn cross-displacement through dense diffusion, demonstrating its ability to generalize to unseen object instances, out-of-distribution scene configurations, and multimodal goals on multiple highly deformable tasks beyond the scope of prior works.

Abstract

The task of "relative placement" is to predict the placement of one object in relation to another, e.g. placing a mug onto a mug rack. Through explicit object-centric geometric reasoning, recent methods for relative placement have made tremendous progress towards data-efficient learning for robot manipulation while generalizing to unseen task variations. However, they have yet to represent deformable transformations, despite the ubiquity of non-rigid bodies in real world settings. As a first step towards bridging this gap, we propose ``cross-displacement" - an extension of the principles of relative placement to geometric relationships between deformable objects - and present a novel vision-based method to learn cross-displacement through dense diffusion. To this end, we demonstrate our method's ability to generalize to unseen object instances, out-of-distribution scene configurations, and multimodal goals on multiple highly deformable tasks (both in simulation and in the real world) beyond the scope of prior works. Supplementary information and videos can be found at https://sites.google.com/view/tax3d-corl-2024 .

Non-rigid Relative Placement through 3D Dense Diffusion

TL;DR

This work proposes ``cross-displacement"- an extension of the principles of relative placement to geometric relationships between deformable objects - and presents a novel vision-based method to learn cross-displacement through dense diffusion, demonstrating its ability to generalize to unseen object instances, out-of-distribution scene configurations, and multimodal goals on multiple highly deformable tasks beyond the scope of prior works.

Abstract

The task of "relative placement" is to predict the placement of one object in relation to another, e.g. placing a mug onto a mug rack. Through explicit object-centric geometric reasoning, recent methods for relative placement have made tremendous progress towards data-efficient learning for robot manipulation while generalizing to unseen task variations. However, they have yet to represent deformable transformations, despite the ubiquity of non-rigid bodies in real world settings. As a first step towards bridging this gap, we propose ``cross-displacement" - an extension of the principles of relative placement to geometric relationships between deformable objects - and present a novel vision-based method to learn cross-displacement through dense diffusion. To this end, we demonstrate our method's ability to generalize to unseen object instances, out-of-distribution scene configurations, and multimodal goals on multiple highly deformable tasks (both in simulation and in the real world) beyond the scope of prior works. Supplementary information and videos can be found at https://sites.google.com/view/tax3d-corl-2024 .

Paper Structure

This paper contains 51 sections, 3 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Our method (TAX3D) uses dense displacement diffusion to determine how to perform a deformable cloth-hanging task for an unseen scene configuration.
  • Figure 2: (Left). During inference, randomly sampled displacements $\Delta X_T \sim \mathcal{N}(0, \mathbf{I})$ are de-noised conditioned on action ($\mathbf{P}_{\mathcal{A}}$) and anchor ($\mathbf{P}_{\mathcal{B}}$) features; the final $\Delta X_0$ is predicted to displace the action into a goal configuration. (Right). Our modified DiT dit architecture combines self-attention and cross-attention for object-centric and scene-level reasoning.
  • Figure 3: TAX3D generalizes to diverse cloths and anchor positions (top); we also visualize the corresponding goal predictions (middle) and successful rollouts (bottom) after releasing the cloth. The two rightmost columns are HangBag configurations - all others are HangProcCloth configurations.
  • Figure 4: Multimodal TAX3D predictions (left), with successful rollouts (right).
  • Figure 5: Real world results. TAX3D succeeds under varying anchor poses, varying peg placements (left, middle-left), and can model multimodal placements with multiple pegs (middle-right, right).
  • ...and 3 more figures