One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering
Yifan Zhu, Tianyi Xiang, Aaron Dollar, Zherong Pan
TL;DR
This work tackles the challenge of learning physically consistent world models from sparse robotic observations by jointly optimizing geometry, appearance, and physical parameters (GAP) of rigid objects. It introduces a differentiable pipeline that combines Shape-as-Points geometry with a grid-based appearance field, a Poisson-based occupancy representation, and a differentiable marching cubes renderer, all integrated with a differentiable rigid-body simulator. The proposed two-stage real-to-sim optimization leverages a geometry prior from web-scale models to recover plausible object shapes and physical properties from a single push, achieving accurate dynamics parameters and plausible novel-view renderings in both simulated and real environments. This approach yields a physically grounded world model suitable for planning and control in novel environments, with potential extensions to multi-object and richer appearance modeling.
Abstract
Identifying predictive world models for robots in novel environments from sparse online observations is essential for robot task planning and execution in novel environments. However, existing methods that leverage differentiable programming to identify world models are incapable of jointly optimizing the geometry, appearance, and physical properties of the scene. In this work, we introduce a novel rigid object representation that allows the joint identification of these properties. Our method employs a novel differentiable point-based geometry representation coupled with a grid-based appearance field, which allows differentiable object collision detection and rendering. Combined with a differentiable physical simulator, we achieve end-to-end optimization of world models, given the sparse visual and tactile observations of a physical motion sequence. Through a series of world model identification tasks in simulated and real environments, we show that our method can learn both simulation- and rendering-ready world models from only one robot action sequence. The code and additional videos are available at our project website: https://tianyi20.github.io/rigid-world-model.github.io/
