Table of Contents
Fetching ...

Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions

Kaifeng Zhang, Shuo Sha, Hanxiao Jiang, Matthew Loper, Hyunjong Song, Guangyan Cai, Zhuo Xu, Xiaochen Hu, Changxi Zheng, Yunzhu Li

TL;DR

This work presents a real-to-sim framework for evaluating robotic manipulation policies on deformable objects by building soft-body digital twins from real videos and rendering them with photorealistic Gaussian Splatting, complemented by PhysTwin-based dynamics. By jointly addressing appearance and dynamics fidelity, the method achieves strong sim-to-real correlation (Pearson r > 0.9) across tasks like plush toy packing, rope routing, and T-block pushing, outperforming a baseline simulator. The approach enables reproducible, scalable policy evaluation and selection of promising checkpoints without co-training in simulation. Ablation studies demonstrate that both color alignment and physics optimization are crucial for reliable prediction of real-world performance, offering concrete guidance for future simulation-based benchmarking. The work shows practical potential for accelerating robotics research by providing trustworthy evaluators that closely reflect real-world outcomes.

Abstract

Robotic manipulation policies are advancing rapidly, but their direct evaluation in the real world remains costly, time-consuming, and difficult to reproduce, particularly for tasks involving deformable objects. Simulation provides a scalable and systematic alternative, yet existing simulators often fail to capture the coupled visual and physical complexity of soft-body interactions. We present a real-to-sim policy evaluation framework that constructs soft-body digital twins from real-world videos and renders robots, objects, and environments with photorealistic fidelity using 3D Gaussian Splatting. We validate our approach on representative deformable manipulation tasks, including plush toy packing, rope routing, and T-block pushing, demonstrating that simulated rollouts correlate strongly with real-world execution performance and reveal key behavioral patterns of learned policies. Our results suggest that combining physics-informed reconstruction with high-quality rendering enables reproducible, scalable, and accurate evaluation of robotic manipulation policies. Website: https://real2sim-eval.github.io/

Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions

TL;DR

This work presents a real-to-sim framework for evaluating robotic manipulation policies on deformable objects by building soft-body digital twins from real videos and rendering them with photorealistic Gaussian Splatting, complemented by PhysTwin-based dynamics. By jointly addressing appearance and dynamics fidelity, the method achieves strong sim-to-real correlation (Pearson r > 0.9) across tasks like plush toy packing, rope routing, and T-block pushing, outperforming a baseline simulator. The approach enables reproducible, scalable policy evaluation and selection of promising checkpoints without co-training in simulation. Ablation studies demonstrate that both color alignment and physics optimization are crucial for reliable prediction of real-world performance, offering concrete guidance for future simulation-based benchmarking. The work shows practical potential for accelerating robotics research by providing trustworthy evaluators that closely reflect real-world outcomes.

Abstract

Robotic manipulation policies are advancing rapidly, but their direct evaluation in the real world remains costly, time-consuming, and difficult to reproduce, particularly for tasks involving deformable objects. Simulation provides a scalable and systematic alternative, yet existing simulators often fail to capture the coupled visual and physical complexity of soft-body interactions. We present a real-to-sim policy evaluation framework that constructs soft-body digital twins from real-world videos and renders robots, objects, and environments with photorealistic fidelity using 3D Gaussian Splatting. We validate our approach on representative deformable manipulation tasks, including plush toy packing, rope routing, and T-block pushing, demonstrating that simulated rollouts correlate strongly with real-world execution performance and reveal key behavioral patterns of learned policies. Our results suggest that combining physics-informed reconstruction with high-quality rendering enables reproducible, scalable, and accurate evaluation of robotic manipulation policies. Website: https://real2sim-eval.github.io/

Paper Structure

This paper contains 53 sections, 2 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Real-to-sim policy evaluation with Gaussian Splatting simulation.Left: Correlation between simulated and real-world success rates across multiple policies (ACT zhao2023learning, DP chi2023diffusionpolicy, Pi-0 black2024pi0, SmolVLA shukor2025smolvla) shows that our simulation reliably predicts real-world performance. Right: Representative tasks used for evaluation, including plush toy packing, rope routing, and T-block pushing, are visualized in both real and simulated settings. Our framework reconstructs soft-body digital twins from real-world videos and achieves realistic appearance and motion, enabling scalable and reproducible policy assessment.
  • Figure 2: Proposed framework for real-to-sim policy evaluation. We present a pipeline that evaluates real-world robot policies in simulation using Gaussian Splatting-based rendering and soft-body digital twins. Policies are first trained on demonstrations collected by the real robot, and a phone scan of the workspace is used to reconstruct the scene via Gaussian Splatting. The reconstruction is segmented into robot, objects, and background, then aligned in position and color to enable photorealistic rendering. For dynamics, we optimize soft-body digital twins from object interaction videos to accurately reproduce real-world behavior. The resulting simulation is exposed through a Gym-style API brockman2016openaigym, allowing trained policies to be evaluated efficiently. Compared with real-world trials, this simulator is cheaper, reproducible, and scalable, while maintaining strong correlation with real-world performance.
  • Figure 3: Correlation between simulation and real-world policy performance.Left: Simulation success rates ($y$-axis) vs. real-world success rates ($x$-axis) for toy packing, rope routing, and T-block pushing, across multiple state-of-the-art imitation learning policies and checkpoints. The tight clustering along the diagonal indicates that, even with binary success metrics, our simulator faithfully reproduces real-world behaviors across tasks and policy robustness levels. Right: Compared with IsaacLab, which models rope routing and push-T tasks, our approach yields substantially stronger sim-to-real correlation, highlighting the benefit of realistic rendering and dynamics.
  • Figure 4: Per-policy, per-task performance across training.$x$-axis: training iterations, $y$-axis: success rates. Simulation (blue) and real-world (orange) success rates are shown across iterations. Unlike Figure \ref{['fig:corr']}, which aggregates across policies, this figure shows unrolled curves for each task-policy pair. Improvements in simulation consistently correspond to improvements in the real world, establishing a positive correlation and demonstrating that our simulator can be a reliable tool for evaluating/selecting policies.
  • Figure 5: Comparison of rendering and dynamics quality. Real-world observations (left) compared with our method, two ablations, and the IsaacLab baseline across three tasks. From right to left, visual and physical fidelity progressively improve. Without physics optimization, object dynamics deviate, causing failures such as the toy’s limbs not fitting into the box or the rope slipping before routing. Without color alignment, rendered images exhibit noticeable color mismatches. The IsaacLab baseline (rightmost) shows lower realism in both rendering and dynamics compared to our approach.
  • ...and 7 more figures