Table of Contents
Fetching ...

DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation

Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Jenny Seidenschwarz, Mike Zheng Shou, Deva Ramanan, Shuran Song, Stan Birchfield, Bowen Wen, Jeffrey Ichnowski

TL;DR

DeformGS tackles scene flow tracking in highly deformable objects by learning a canonical Gaussian representation and a deformation function that maps to world space, enabling accurate 3D tracking under occlusions and shadows. It uses a neural-voxel encoding and a deformation MLP to predict Gaussian position, rotation, and a shadow scalar, with physics-inspired regularization—local isometry and momentum conservation—and per-Gaussian dynamic masks. The method improves 3D tracking by about 56% over state-of-the-art on synthetic datasets and achieves mm-scale tracking on large cloth scenes, while enabling downstream robotics tasks like digital twins and keypoint-guided grasping. The work provides six synthetic cloth scenes and real-world Robo360 validation, highlighting both accuracy gains and practical applicability in deformable-object manipulation.

Abstract

Teaching robots to fold, drape, or reposition deformable objects such as cloth will unlock a variety of automation applications. While remarkable progress has been made for rigid object manipulation, manipulating deformable objects poses unique challenges, including frequent occlusions, infinite-dimensional state spaces and complex dynamics. Just as object pose estimation and tracking have aided robots for rigid manipulation, dense 3D tracking (scene flow) of highly deformable objects will enable new applications in robotics while aiding existing approaches, such as imitation learning or creating digital twins with real2sim transfer. We propose DeformGS, an approach to recover scene flow in highly deformable scenes, using simultaneous video captures of a dynamic scene from multiple cameras. DeformGS builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel-view synthesis. DeformGS learns a deformation function to project a set of Gaussians with canonical properties into world space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on conservation of momentum and isometry, which leads to trajectories with smaller trajectory errors. We also leverage existing foundation models SAM and XMEM to produce noisy masks, and learn a per-Gaussian mask for better physics-inspired regularization. DeformGS achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. In experiments, DeformGS improves 3D tracking by an average of 55.8% compared to the state-of-the-art. With sufficient texture, DeformGS achieves a median tracking error of 3.3 mm on a cloth of 1.5 x 1.5 m in area. Website: https://deformgs.github.io

DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation

TL;DR

DeformGS tackles scene flow tracking in highly deformable objects by learning a canonical Gaussian representation and a deformation function that maps to world space, enabling accurate 3D tracking under occlusions and shadows. It uses a neural-voxel encoding and a deformation MLP to predict Gaussian position, rotation, and a shadow scalar, with physics-inspired regularization—local isometry and momentum conservation—and per-Gaussian dynamic masks. The method improves 3D tracking by about 56% over state-of-the-art on synthetic datasets and achieves mm-scale tracking on large cloth scenes, while enabling downstream robotics tasks like digital twins and keypoint-guided grasping. The work provides six synthetic cloth scenes and real-world Robo360 validation, highlighting both accuracy gains and practical applicability in deformable-object manipulation.

Abstract

Teaching robots to fold, drape, or reposition deformable objects such as cloth will unlock a variety of automation applications. While remarkable progress has been made for rigid object manipulation, manipulating deformable objects poses unique challenges, including frequent occlusions, infinite-dimensional state spaces and complex dynamics. Just as object pose estimation and tracking have aided robots for rigid manipulation, dense 3D tracking (scene flow) of highly deformable objects will enable new applications in robotics while aiding existing approaches, such as imitation learning or creating digital twins with real2sim transfer. We propose DeformGS, an approach to recover scene flow in highly deformable scenes, using simultaneous video captures of a dynamic scene from multiple cameras. DeformGS builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel-view synthesis. DeformGS learns a deformation function to project a set of Gaussians with canonical properties into world space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on conservation of momentum and isometry, which leads to trajectories with smaller trajectory errors. We also leverage existing foundation models SAM and XMEM to produce noisy masks, and learn a per-Gaussian mask for better physics-inspired regularization. DeformGS achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. In experiments, DeformGS improves 3D tracking by an average of 55.8% compared to the state-of-the-art. With sufficient texture, DeformGS achieves a median tracking error of 3.3 mm on a cloth of 1.5 x 1.5 m in area. Website: https://deformgs.github.io
Paper Structure (21 sections, 9 equations, 6 figures, 1 table)

This paper contains 21 sections, 9 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: We propose DeformGS, a method that improves state-of-the-art methods for accurate 3D point tracking in highly deformable scenes. This figure shows the rendering and tracking of DeformGS in the six dynamic Blender blender scenes used for evaluation. We will refer to the scenes in this Figure as Scenes 1, 2, 3, 4, 5 and 6 ordered from left to right.
  • Figure 2: DeformGS maps a set of Gausians with canonical properties to metric space using a deformation function $F$. The deformation function takes in the position of a Gaussian $(x,y,z)$ and a queried timestamp $t$, to infer shadow $s$, rotation $r'$ and metric position $x'$. During training, we use the metric positions and rotations to regularize the deformation function, considering the state at $t = \{i-1,i,i+1\}$ with Gaussian metric states $P'_{t-1},P'_{t},P'_{t+1}$
  • Figure 3: DeformGS uses three adjacent timesteps at every iteration to enforce physics-inspired regularization terms. All Gaussians are deformed to world space using the deformation function $F$, and rasterized to compute the photometric loss and its gradients. The positions of the Gaussians are used to compute the regularization terms based on local isometry and conservation of momentum (Section \ref{['sec:regularization']}).
  • Figure 4: Results on Scene 5: randomly sampled ground-truth trajectories in green, inferred trajectories in red, and the error of corresponding points in red lines. Compared to the baseline methods, DeformGS results in fewer errors in 3D tracking.
  • Figure 5: A person manipulating a duvet in the Robo360 liang2023robo360 dataset, reconstructed using DeformGS. The top row shows the 4D Gaussians as point clouds, where the color represents dense correspondences. The bottom row shows rendered views overlaid with 3D trajectories projected to image space.
  • ...and 1 more figures