DeformNet: Latent Space Modeling and Dynamics Prediction for Deformable Object Manipulation
Chenchang Li, Zihao Ai, Tong Wu, Xiaosa Li, Wenbo Ding, Huazhe Xu
TL;DR
DeformNet addresses deformable object manipulation by learning a compact latent representation from RGB-D data and predicting its dynamics with an RSSM, enabling planning through model-predictive control with iCEM. The representation combines a PointNet-based latent deformation vector $e_d$ and an appearance vector $e_a$ to condition a conditional NeRF decoder, which can render images from predicted latents. The approach outperforms baselines in six simulated tasks and shows real-world viability on a UR5 robot, achieving favorable geometry and shape metrics and demonstrating sim-to-real transfer. By unifying 3D representation, dynamics learning, and goal-directed planning under a data-driven framework, DeformNet advances practical deformable-object manipulation.
Abstract
Manipulating deformable objects is a ubiquitous task in household environments, demanding adequate representation and accurate dynamics prediction due to the objects' infinite degrees of freedom. This work proposes DeformNet, which utilizes latent space modeling with a learned 3D representation model to tackle these challenges effectively. The proposed representation model combines a PointNet encoder and a conditional neural radiance field (NeRF), facilitating a thorough acquisition of object deformations and variations in lighting conditions. To model the complex dynamics, we employ a recurrent state-space model (RSSM) that accurately predicts the transformation of the latent representation over time. Extensive simulation experiments with diverse objectives demonstrate the generalization capabilities of DeformNet for various deformable object manipulation tasks, even in the presence of previously unseen goals. Finally, we deploy DeformNet on an actual UR5 robotic arm to demonstrate its capability in real-world scenarios.
