Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning
Amine Elhafsi, Daniel Morton, Marco Pavone
TL;DR
The paper tackles physically grounded robot planning in unstructured environments by bridging scene reconstruction, semantic/material understanding, and physics-based prediction. It introduces Scan, Materialize, Simulate (SMS), a framework that uses 3D Gaussian Splatting for geometry, visual foundation models for object segmentation, a vision-language model for material inference, and a physics engine for outcome prediction, enabling generalizable, object-centric planning. The approach is validated on billiards-inspired manipulation and quadrotor landing in both simulated domain transfer and real hardware, showing robust physical reasoning beyond purely data-driven policies. By integrating differentiable rendering, semantic understanding, and Newtonian mechanics, SMS provides a practical path toward physically grounded robot planning across diverse settings.
Abstract
Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.
