Table of Contents
Fetching ...

PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, Yunzhu Li

TL;DR

PhysTwin tackles reconstructing physically plausible digital twins of deformable objects from sparse videos by fusing a spring-mass physics model with generative geometry and Gaussian-based rendering. The method uses a two-stage inverse optimization: first to recover geometry and physical parameters using a TRELLIS shape prior and sparse-to-dense refinement, then to fit appearance via Gaussian kernels with Linear Blend Skinning for deformation-aware rendering. Empirical results show superior reconstruction, resimulation, and unseen-interaction generalization compared with strong baselines, plus real-time forward simulation suitable for interactive control and robotic planning. By uniting perception with physics-based simulation and emphasizing data efficiency, PhysTwin offers a practical path toward robust deformable-object digital twins in robotics and interactive media.

Abstract

Creating a physical digital twin of a real-world object has immense potential in robotics, content creation, and XR. In this paper, we present PhysTwin, a novel framework that uses sparse videos of dynamic objects under interaction to produce a photo- and physically realistic, real-time interactive virtual replica. Our approach centers on two key components: (1) a physics-informed representation that combines spring-mass models for realistic physical simulation, generative shape models for geometry, and Gaussian splats for rendering; and (2) a novel multi-stage, optimization-based inverse modeling framework that reconstructs complete geometry, infers dense physical properties, and replicates realistic appearance from videos. Our method integrates an inverse physics framework with visual perception cues, enabling high-fidelity reconstruction even from partial, occluded, and limited viewpoints. PhysTwin supports modeling various deformable objects, including ropes, stuffed animals, cloth, and delivery packages. Experiments show that PhysTwin outperforms competing methods in reconstruction, rendering, future prediction, and simulation under novel interactions. We further demonstrate its applications in interactive real-time simulation and model-based robotic motion planning.

PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos

TL;DR

PhysTwin tackles reconstructing physically plausible digital twins of deformable objects from sparse videos by fusing a spring-mass physics model with generative geometry and Gaussian-based rendering. The method uses a two-stage inverse optimization: first to recover geometry and physical parameters using a TRELLIS shape prior and sparse-to-dense refinement, then to fit appearance via Gaussian kernels with Linear Blend Skinning for deformation-aware rendering. Empirical results show superior reconstruction, resimulation, and unseen-interaction generalization compared with strong baselines, plus real-time forward simulation suitable for interactive control and robotic planning. By uniting perception with physics-based simulation and emphasizing data efficiency, PhysTwin offers a practical path toward robust deformable-object digital twins in robotics and interactive media.

Abstract

Creating a physical digital twin of a real-world object has immense potential in robotics, content creation, and XR. In this paper, we present PhysTwin, a novel framework that uses sparse videos of dynamic objects under interaction to produce a photo- and physically realistic, real-time interactive virtual replica. Our approach centers on two key components: (1) a physics-informed representation that combines spring-mass models for realistic physical simulation, generative shape models for geometry, and Gaussian splats for rendering; and (2) a novel multi-stage, optimization-based inverse modeling framework that reconstructs complete geometry, infers dense physical properties, and replicates realistic appearance from videos. Our method integrates an inverse physics framework with visual perception cues, enabling high-fidelity reconstruction even from partial, occluded, and limited viewpoints. PhysTwin supports modeling various deformable objects, including ropes, stuffed animals, cloth, and delivery packages. Experiments show that PhysTwin outperforms competing methods in reconstruction, rendering, future prediction, and simulation under novel interactions. We further demonstrate its applications in interactive real-time simulation and model-based robotic motion planning.

Paper Structure

This paper contains 18 sections, 8 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: PhysTwin takes sparse videos (three camera views) of deformable objects under interaction as input and reconstructs a simulatable digital twin with complete geometry, high-fidelity appearance, and accurate physical parameters. This enables multiple applications, such as real-time interactive simulation using keyboards and robotic teleoperation devices, as well as model-based robot planning.
  • Figure 2: Overview of Our PhysTwin Framework. We present an overview of our PhysTwin framework, where the core representation includes geometry, topology, physical parameters (associated with springs and contacts), and Gaussian kernels. To optimize PhysTwin, we minimize the rendering loss and the discrepancy between simulated and observed geometry/motion. The rendering loss optimizes the Gaussian kernels, while the geometry and motion losses refine the overall geometry, topology, and physical parameters in PhysTwin.
  • Figure 3: Qualitative Results on Reconstruction & Resimulation and Future Prediction. We visualize the rendering results of different methods on two tasks. For the reconstruction & resimulation task, our method achieves a better match with the observations. For the future prediction task, our method accurately predicts the future state of the objects. In contrast, the baselines fail in most cases: GS-Dynamics zhang2024dynamic tends to remain static, while Spring-Gauss zhong2024reconstruction frequently causes the physical model to crash.
  • Figure 4: Qualitative Results on Generalization to Unseen Interactions. We visualize the simulation of a deformable object under unseen interactions using our method and GS-Dynamics zhang2024dynamic. The leftmost image shows the interaction used to train the dynamics models, while the images on the right demonstrate their generalization to unseen interactions. Our PhysTwin significantly outperforms prior work.
  • Figure 5: Applications of our PhysTwin. Our constructed PhysTwin supports a variety of tasks, including real-time interactive simulation, which can accept input from either a keyboard or a robot teleoperation setup. Meanwhile, PhysTwin also enables model-based robot planning to accomplish tasks such as lifting a rope into some specific configuration.
  • ...and 4 more figures