Table of Contents
Fetching ...

DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects

Dominik Bauer, Zhenjia Xu, Shuran Song

TL;DR

DoughNet addresses the challenge of topological manipulation in elastoplastic objects by introducing a topology-aware visual predictive model that operates in latent space. It combines a Transformer-based shape encoder with a dynamics model that autoregressively predicts geometry and topology changes, and a geometry decoder that outputs per-component occupancy masks along with genus classification. Trained on synthetic MLS-MPM data with an explicit topology-checking pipeline, DoughNet achieves superior long-horizon predictions and enables goal-directed planning via a CEM-based planner, including sim-to-real transfer to real robotic setups. The approach advances planning for complex manipulations, where topology, not just geometry, determines success, and provides a data generator to facilitate future research in topological manipulation of deformable objects.

Abstract

Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a denoising autoencoder represents deformable objects of varying topology as sets of latent codes. Second, a visual predictive model performs autoregressive set prediction to determine long-horizon geometrical deformation and topological changes purely in latent space. Given a partial initial state and desired manipulation trajectories, it infers all resulting object geometries and topologies at each step. DoughNet thereby allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals. Our experiments in simulated and real environments show that DoughNet is able to significantly outperform related approaches that consider deformation only as geometrical change.

DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects

TL;DR

DoughNet addresses the challenge of topological manipulation in elastoplastic objects by introducing a topology-aware visual predictive model that operates in latent space. It combines a Transformer-based shape encoder with a dynamics model that autoregressively predicts geometry and topology changes, and a geometry decoder that outputs per-component occupancy masks along with genus classification. Trained on synthetic MLS-MPM data with an explicit topology-checking pipeline, DoughNet achieves superior long-horizon predictions and enables goal-directed planning via a CEM-based planner, including sim-to-real transfer to real robotic setups. The approach advances planning for complex manipulations, where topology, not just geometry, determines success, and provides a data generator to facilitate future research in topological manipulation of deformable objects.

Abstract

Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a denoising autoencoder represents deformable objects of varying topology as sets of latent codes. Second, a visual predictive model performs autoregressive set prediction to determine long-horizon geometrical deformation and topological changes purely in latent space. Given a partial initial state and desired manipulation trajectories, it infers all resulting object geometries and topologies at each step. DoughNet thereby allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals. Our experiments in simulated and real environments show that DoughNet is able to significantly outperform related approaches that consider deformation only as geometrical change.
Paper Structure (17 sections, 14 figures, 2 tables)

This paper contains 17 sections, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Topological Manipulation. Given partial point clouds of the initial and the goal state, DoughNet predicts and scores the outcome of sampled plans. Executing the best found plan successfully recreates the goal.
  • Figure 2: DoughNet Pipeline. We encode the initial partial observation $X_0$ to a set of latent codes $[{\bm{z}}_0]$ using a learned geometry embedding $\Phi$. The given interaction $a_0$ yields the next latent codes $[{\bm{z}}_1]$, which serve as input in subsequent time steps $t\rightarrow T$. The latent codes may be reconstructed into components using a learned topology embedding $\theta$. This allows DoughNet to reconstruct the objects' geometry $\Tilde{X}_t$ at sample locations $[{\bm{z}}_s]$. In addition, we may extract their topology $\Tilde{G}_t$ from the per-component latents $[{\bm{z}}_\theta]$.
  • Figure 3: Data Distribution. The scales of the objects and EE geometry, where test samples are held-out from regions inside and outside the training boundaries.
  • Figure 4: Topology Check. Left: Two components (same or different object) are considered merged iff they remain connected after opposite velocities are applied. Right: A component is considered split into two iff they have no connection after checking.
  • Figure 5: Performance on Simulated Sequences. Mean performance per frame; dashed for the static topology check ($^+$), and solid for dynamic ($^\ddagger$) and predicted (*X).
  • ...and 9 more figures