ContactGaussian-WM: Learning Physics-Grounded World Model from Videos
Meizhong Wang, Wanxin Jin, Kun Cao, Lihua Xie, Yiguang Hong
TL;DR
ContactGaussian-WM tackles learning physics-grounded world models from sparse, contact-rich video data to support planning and simulation in robotics. It introduces a unified Gaussian representation for both geometry and appearance and enables end-to-end differentiable learning by differentiating through a closed-form physics engine, using Stage I SG-GS initialization and Stage II Phys-Geo refinement. The paper shows strong generalization in simulation and real-world tests, outperforming data-driven and prior physics-based baselines, and demonstrates practical use in data synthesis and real-time MPC. This work advances robust sim-to-real transfer and long-horizon prediction in contact-rich environments.
Abstract
Developing world models that understand complex physical interactions is essential for advancing robotic planning and simulation.However, existing methods often struggle to accurately model the environment under conditions of data scarcity and complex contact-rich dynamic motion.To address these challenges, we propose ContactGaussian-WM, a differentiable physics-grounded rigid-body world model capable of learning intricate physical laws directly from sparse and contact-rich video sequences.Our framework consists of two core components: (1) a unified Gaussian representation for both visual appearance and collision geometry, and (2) an end-to-end differentiable learning framework that differentiates through a closed-form physics engine to infer physical properties from sparse visual observations.Extensive simulations and real-world evaluations demonstrate that ContactGaussian-WM outperforms state-of-the-art methods in learning complex scenarios, exhibiting robust generalization capabilities.Furthermore, we showcase the practical utility of our framework in downstream applications, including data synthesis and real-time MPC.
