DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
Dominik Bauer, Zhenjia Xu, Shuran Song
TL;DR
DoughNet addresses the challenge of topological manipulation in elastoplastic objects by introducing a topology-aware visual predictive model that operates in latent space. It combines a Transformer-based shape encoder with a dynamics model that autoregressively predicts geometry and topology changes, and a geometry decoder that outputs per-component occupancy masks along with genus classification. Trained on synthetic MLS-MPM data with an explicit topology-checking pipeline, DoughNet achieves superior long-horizon predictions and enables goal-directed planning via a CEM-based planner, including sim-to-real transfer to real robotic setups. The approach advances planning for complex manipulations, where topology, not just geometry, determines success, and provides a data generator to facilitate future research in topological manipulation of deformable objects.
Abstract
Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a denoising autoencoder represents deformable objects of varying topology as sets of latent codes. Second, a visual predictive model performs autoregressive set prediction to determine long-horizon geometrical deformation and topological changes purely in latent space. Given a partial initial state and desired manipulation trajectories, it infers all resulting object geometries and topologies at each step. DoughNet thereby allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals. Our experiments in simulated and real environments show that DoughNet is able to significantly outperform related approaches that consider deformation only as geometrical change.
