Table of Contents
Fetching ...

Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning

Amine Elhafsi, Daniel Morton, Marco Pavone

TL;DR

The paper tackles physically grounded robot planning in unstructured environments by bridging scene reconstruction, semantic/material understanding, and physics-based prediction. It introduces Scan, Materialize, Simulate (SMS), a framework that uses 3D Gaussian Splatting for geometry, visual foundation models for object segmentation, a vision-language model for material inference, and a physics engine for outcome prediction, enabling generalizable, object-centric planning. The approach is validated on billiards-inspired manipulation and quadrotor landing in both simulated domain transfer and real hardware, showing robust physical reasoning beyond purely data-driven policies. By integrating differentiable rendering, semantic understanding, and Newtonian mechanics, SMS provides a practical path toward physically grounded robot planning across diverse settings.

Abstract

Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.

Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning

TL;DR

The paper tackles physically grounded robot planning in unstructured environments by bridging scene reconstruction, semantic/material understanding, and physics-based prediction. It introduces Scan, Materialize, Simulate (SMS), a framework that uses 3D Gaussian Splatting for geometry, visual foundation models for object segmentation, a vision-language model for material inference, and a physics engine for outcome prediction, enabling generalizable, object-centric planning. The approach is validated on billiards-inspired manipulation and quadrotor landing in both simulated domain transfer and real hardware, showing robust physical reasoning beyond purely data-driven policies. By integrating differentiable rendering, semantic understanding, and Newtonian mechanics, SMS provides a practical path toward physically grounded robot planning across diverse settings.

Abstract

Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.

Paper Structure

This paper contains 25 sections, 19 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Overview of the Scan, Materialize, Simulate (SMS) framework for physics-informed robot action planning. SMS consists of three steps: scanning to build a geometric environment model, materializing to convert this reconstruction into a simulation-ready representation, and simulating to optimize actions in a virtual environment prior to target-environment execution.
  • Figure 2: Billiards scenes. Left: Motion composite images show the actions optimized in the reconstructed virtual environment (top) and subsequently realized in the target environment (bottom) for Scenes A, B, and C (cue and target balls colored yellow and blue, respectively, and the target is indicated by the bullseye pattern). Scene A depicts a direct shot from cue to target ball. Scene B involves a rebound off the heavy brass sculpture. Scene C features a complex double-rebound involving the Buddha statue and the billiard ball. Right: For the more challenging Scene D, SMS finds two distinct solutions, utilizing rebounds off the soccer ball (top) and the marble statue (bottom).
  • Figure 3: Distributions of SMS performance over 30 repeated action optimizations. Baseline results are shown for comparison. Lower is better.
  • Figure 4: Quadrotor approach comparison. Left: SMS optimizes approach paths that avoid the overhanging ledge. Right: Visual prompting does not consider propeller wash, resulting in flight over the overhang. Note that several landing sites are chosen at the edge and corner of the landing platform.
  • Figure 5: Quadrotor hardware demonstration. Left: A direct approach causes the propeller wash to topple the landing platform. Right: SMS optimizes a roundabout trajectory that avoids disturbing the box and enables a successful landing.
  • ...and 5 more figures