Table of Contents
Fetching ...

SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting

Mohammad Nomaan Qureshi, Sparsh Garg, Francisco Yandun, David Held, George Kantor, Abhisesh Silwal

TL;DR

SplatSim introduces Gaussian Splatting as a photorealistic rendering primitive within existing simulators to bridge the RGB Sim2Real gap for manipulation policies. By aligning robot and object Gaussians through ICP and forward kinematics, the framework renders high-fidelity synthetic trajectories used to train diffusion-based policies with augmentations, enabling zero-shot deployment to the real world. Across four tasks, SplatSim achieves an average zero-shot success of 86.25%, approaching Real2Real performance while drastically reducing data collection effort via automated simulation demonstrations. The work highlights the viability of RGB-only, zero-shot transfer for contact-rich manipulation and outlines future extensions to deformable objects and more dynamic skills.

Abstract

Sim2Real transfer, particularly for manipulation policies relying on RGB images, remains a critical challenge in robotics due to the significant domain shift between synthetic and real-world visual data. In this paper, we propose SplatSim, a novel framework that leverages Gaussian Splatting as the primary rendering primitive to reduce the Sim2Real gap for RGB-based manipulation policies. By replacing traditional mesh representations with Gaussian Splats in simulators, SplatSim produces highly photorealistic synthetic data while maintaining the scalability and cost-efficiency of simulation. We demonstrate the effectiveness of our framework by training manipulation policies within SplatSim and deploying them in the real world in a zero-shot manner, achieving an average success rate of 86.25%, compared to 97.5% for policies trained on real-world data. Videos can be found on our project page: https://splatsim.github.io

SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting

TL;DR

SplatSim introduces Gaussian Splatting as a photorealistic rendering primitive within existing simulators to bridge the RGB Sim2Real gap for manipulation policies. By aligning robot and object Gaussians through ICP and forward kinematics, the framework renders high-fidelity synthetic trajectories used to train diffusion-based policies with augmentations, enabling zero-shot deployment to the real world. Across four tasks, SplatSim achieves an average zero-shot success of 86.25%, approaching Real2Real performance while drastically reducing data collection effort via automated simulation demonstrations. The work highlights the viability of RGB-only, zero-shot transfer for contact-rich manipulation and outlines future extensions to deformable objects and more dynamic skills.

Abstract

Sim2Real transfer, particularly for manipulation policies relying on RGB images, remains a critical challenge in robotics due to the significant domain shift between synthetic and real-world visual data. In this paper, we propose SplatSim, a novel framework that leverages Gaussian Splatting as the primary rendering primitive to reduce the Sim2Real gap for RGB-based manipulation policies. By replacing traditional mesh representations with Gaussian Splats in simulators, SplatSim produces highly photorealistic synthetic data while maintaining the scalability and cost-efficiency of simulation. We demonstrate the effectiveness of our framework by training manipulation policies within SplatSim and deploying them in the real world in a zero-shot manner, achieving an average success rate of 86.25%, compared to 97.5% for policies trained on real-world data. Videos can be found on our project page: https://splatsim.github.io
Paper Structure (27 sections, 4 equations, 5 figures, 1 table)

This paper contains 27 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: We employ Gaussian Splatting kerbl3Dgaussians as the primary rendering primitive within existing simulation environments to generate highly photorealistic synthetic data for robotic manipulation tasks. Our framework retains all the traditional advantages of simulators—including scalability, cost-efficiency, and safety—while enhancing visual realism. Policies trained exclusively on this synthetic data exhibit zero-shot transfer capabilities to real-world scenarios, achieving performance comparable to those trained on real-world datasets.
  • Figure 2: Top: Our proposed SplatSim framework. Expert demonstrations are collected (a) in a physics simulator (PyBullet). In our case, these demonstrations come either from human experts (teleoperation via Gello wu2023gello) or through a privileged information-based motion planner. The trajectories from the simulator are then fed to the simulator-aligned splat models of the scene and the object (b). We transform the 3D Gaussians to manipulate the static Gaussian Splat models, as delineated in Sec. \ref{['sec:robot_splat_models']}, to extract photorealistic renderings of the scene at novel joint and object poses, which serve as the RGB state observations for the diffusion policy. Along with these RGB observations, diffusion policy chi2023diffusionpolicy also takes the end effector position and orientation as the input. We augment the end effector states as well. Bottom: Once trained with the sim data, we freeze the policy and directly deploy it to the real-world setting.
  • Figure 3: The robot is visualized in a static scene by first creating a Gaussian splat of the scene with the robot in its home position. The robot's point cloud is manually segmented and aligned with the canonical robot frame using the ICP algorithm. Each robot link is then segmented, and forward kinematics transformations are applied, enabling the rendering of the robot at arbitrary joint configurations.
  • Figure 4: We use a KNN-based classifier for segmenting links for articulated objects like parallel jaw grippers. We train a KNN model with the ground truth point labeling from the URDF model of the end effector.
  • Figure 5: SplatSim Rollout: enderings from our SplatSim framework across four different manipulation tasks.