Table of Contents
Fetching ...

DeformNet: Latent Space Modeling and Dynamics Prediction for Deformable Object Manipulation

Chenchang Li, Zihao Ai, Tong Wu, Xiaosa Li, Wenbo Ding, Huazhe Xu

TL;DR

DeformNet addresses deformable object manipulation by learning a compact latent representation from RGB-D data and predicting its dynamics with an RSSM, enabling planning through model-predictive control with iCEM. The representation combines a PointNet-based latent deformation vector $e_d$ and an appearance vector $e_a$ to condition a conditional NeRF decoder, which can render images from predicted latents. The approach outperforms baselines in six simulated tasks and shows real-world viability on a UR5 robot, achieving favorable geometry and shape metrics and demonstrating sim-to-real transfer. By unifying 3D representation, dynamics learning, and goal-directed planning under a data-driven framework, DeformNet advances practical deformable-object manipulation.

Abstract

Manipulating deformable objects is a ubiquitous task in household environments, demanding adequate representation and accurate dynamics prediction due to the objects' infinite degrees of freedom. This work proposes DeformNet, which utilizes latent space modeling with a learned 3D representation model to tackle these challenges effectively. The proposed representation model combines a PointNet encoder and a conditional neural radiance field (NeRF), facilitating a thorough acquisition of object deformations and variations in lighting conditions. To model the complex dynamics, we employ a recurrent state-space model (RSSM) that accurately predicts the transformation of the latent representation over time. Extensive simulation experiments with diverse objectives demonstrate the generalization capabilities of DeformNet for various deformable object manipulation tasks, even in the presence of previously unseen goals. Finally, we deploy DeformNet on an actual UR5 robotic arm to demonstrate its capability in real-world scenarios.

DeformNet: Latent Space Modeling and Dynamics Prediction for Deformable Object Manipulation

TL;DR

DeformNet addresses deformable object manipulation by learning a compact latent representation from RGB-D data and predicting its dynamics with an RSSM, enabling planning through model-predictive control with iCEM. The representation combines a PointNet-based latent deformation vector and an appearance vector to condition a conditional NeRF decoder, which can render images from predicted latents. The approach outperforms baselines in six simulated tasks and shows real-world viability on a UR5 robot, achieving favorable geometry and shape metrics and demonstrating sim-to-real transfer. By unifying 3D representation, dynamics learning, and goal-directed planning under a data-driven framework, DeformNet advances practical deformable-object manipulation.

Abstract

Manipulating deformable objects is a ubiquitous task in household environments, demanding adequate representation and accurate dynamics prediction due to the objects' infinite degrees of freedom. This work proposes DeformNet, which utilizes latent space modeling with a learned 3D representation model to tackle these challenges effectively. The proposed representation model combines a PointNet encoder and a conditional neural radiance field (NeRF), facilitating a thorough acquisition of object deformations and variations in lighting conditions. To model the complex dynamics, we employ a recurrent state-space model (RSSM) that accurately predicts the transformation of the latent representation over time. Extensive simulation experiments with diverse objectives demonstrate the generalization capabilities of DeformNet for various deformable object manipulation tasks, even in the presence of previously unseen goals. Finally, we deploy DeformNet on an actual UR5 robotic arm to demonstrate its capability in real-world scenarios.
Paper Structure (16 sections, 5 equations, 6 figures, 2 tables)

This paper contains 16 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overview of DeformNet framework. For a given time step, an encoder initially transforms images captured from diverse viewpoints into a coherent embedding. Subsequently, a dynamics model is employed to forecast the resultant embedding following a set of sampled actions. Finally, a NeRF decoder is utilized to generate images conditioning on the forecasted embeddings.
  • Figure 2: Overview of the model architecture. (a) The PointNet encoder generates latent features. These latent features are split and utilized to enhance the NeRF. (b) We then employ RSSM to capture the dynamics within the trained latent space. (c) We examine the effect of latent deformation vectors and latent appearance vectors in a fixed dataset with various lighting conditions and object shapes.
  • Figure 3: Simulator visualizations. (a) The robot demonstrates its ability to manipulate plasticine through pinching actions, corresponding to the environment 'Pinch'. In the 'Poke' task, the gripper is replaced with a stick. (b) The robot showcases its writing skills by creating characters on a clay surface in environment 'Write'. (c) In the 'Push' task, the robot manipulates the towel by pushing it with a stick to achieve specific target shapes. (d) Within the 'Transport' and 'Pile' environments, the robot adeptly either relocates grains to desired locations or pushes them into predefined forms.
  • Figure 4: Control results and example trajectories by DeformNet. The target configuration to be achieved is depicted in the image on the right. The planning process is illustrated in the left five columns, while the control results obtained from the algorithm are displayed in the sixth column. The five rows correspond to the 'Pinch', 'Push', 'Write', 'Pile', and, 'Transport' tasks, respectively. In the 'Pinch' task, the gripper's action is symbolized by a blue dot. In the 'Push' and 'Write' tasks, the actions executed by the stick are illustrated by blue arrows. In the 'Pile' and 'Transport' tasks, the arrows represent the piling direction and distance.
  • Figure 5: Real world experiment setup. Left: an overview of the real robot experiment setup. Right: 3D-printed tools including (a) the mold for resetting, and (b) the platform to fix the dough.
  • ...and 1 more figures