Table of Contents
Fetching ...

RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator

Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, Ruiping Wang

TL;DR

RoboGSim tackles the data bottleneck in policy learning for robot manipulation by combining Real2Sim2Real synthesis with a high-fidelity Gaussian-based scene representation. The system reconstructs scenes with 3D Gaussian Splatting, builds digital twins in Isaac Sim, and renders novel views, objects, and trajectories; it also supports closed-loop evaluation of policies in a physics-consistent setting. Experiments demonstrate zero-shot transfer to real robots and, in novel views/scenes, even better performance than real data, while reducing data collection cost. The work offers a practical, fair, and online evaluation platform for policy learning and helps narrow the sim2real gap.

Abstract

Efficient acquisition of real-world embodied data has been increasingly critical. However, large-scale demonstrations captured by remote operation tend to take extremely high costs and fail to scale up the data size in an efficient manner. Sampling the episodes under a simulated environment is a promising way for large-scale collection while existing simulators fail to high-fidelity modeling on texture and physics. To address these limitations, we introduce the RoboGSim, a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine. RoboGSim mainly includes four parts: Gaussian Reconstructor, Digital Twins Builder, Scene Composer, and Interactive Engine. It can synthesize the simulated data with novel views, objects, trajectories, and scenes. RoboGSim also provides an online, reproducible, and safe evaluation for different manipulation policies. The real2sim and sim2real transfer experiments show a high consistency in the texture and physics. We compared the test results of RoboGSim data and real robot data on both RoboGSim and real robot platforms. The experimental results show that the RoboGSim data model can achieve zero-shot performance on the real robot, with results comparable to real robot data. Additionally, in experiments with novel perspectives and novel scenes, the RoboGSim data model performed even better on the real robot than the real robot data model. This not only helps reduce the sim2real gap but also addresses the limitations of real robot data collection, such as its single-source and high cost. We hope RoboGSim serves as a closed-loop simulator for fair comparison on policy learning. More information can be found on our project page https://robogsim.github.io/.

RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator

TL;DR

RoboGSim tackles the data bottleneck in policy learning for robot manipulation by combining Real2Sim2Real synthesis with a high-fidelity Gaussian-based scene representation. The system reconstructs scenes with 3D Gaussian Splatting, builds digital twins in Isaac Sim, and renders novel views, objects, and trajectories; it also supports closed-loop evaluation of policies in a physics-consistent setting. Experiments demonstrate zero-shot transfer to real robots and, in novel views/scenes, even better performance than real data, while reducing data collection cost. The work offers a practical, fair, and online evaluation platform for policy learning and helps narrow the sim2real gap.

Abstract

Efficient acquisition of real-world embodied data has been increasingly critical. However, large-scale demonstrations captured by remote operation tend to take extremely high costs and fail to scale up the data size in an efficient manner. Sampling the episodes under a simulated environment is a promising way for large-scale collection while existing simulators fail to high-fidelity modeling on texture and physics. To address these limitations, we introduce the RoboGSim, a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine. RoboGSim mainly includes four parts: Gaussian Reconstructor, Digital Twins Builder, Scene Composer, and Interactive Engine. It can synthesize the simulated data with novel views, objects, trajectories, and scenes. RoboGSim also provides an online, reproducible, and safe evaluation for different manipulation policies. The real2sim and sim2real transfer experiments show a high consistency in the texture and physics. We compared the test results of RoboGSim data and real robot data on both RoboGSim and real robot platforms. The experimental results show that the RoboGSim data model can achieve zero-shot performance on the real robot, with results comparable to real robot data. Additionally, in experiments with novel perspectives and novel scenes, the RoboGSim data model performed even better on the real robot than the real robot data model. This not only helps reduce the sim2real gap but also addresses the limitations of real robot data collection, such as its single-source and high cost. We hope RoboGSim serves as a closed-loop simulator for fair comparison on policy learning. More information can be found on our project page https://robogsim.github.io/.

Paper Structure

This paper contains 17 sections, 15 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: RoboGSim is an efficient, low-cost interactive platform with high-fidelity rendering. It achieves demonstration synthesis with novel scenes, novel objects, and novel views, facilitating data scaling for policy learning. Additionally, it can perform the closed-loop simulation for safe, fair and realistic evaluation on different policy models.
  • Figure 2: Overview of the RoboGSim Pipeline: (1) Inputs: multi-view RGB image sequences and MDH parameters of the robotic arm. (2) Gaussian Reconstructor: reconstruct the scene and objects using 3DGS, segment the robotic arm and build an MDH kinematic drive graph structure for accurate arm motion modeling. (3) Digital Twins Builder: perform mesh reconstruction of both the scene and objects, then create a digital twin in Isaac Sim, ensuring high fidelity in simulation. (4) Scene Composer: combine the robotic arm and objects in the simulation, identify optimal test viewpoints using tracking, and render images from new perspectives. (5) Interactive Engine: (i) The synthesized images with novel scenes/views/objects are used for policy learning. (ii) Policy networks can be evaluated in a close-loop manner. (iii) The embodied data can be collected by the VR/Xbox equipment.
  • Figure 3: Real2Sim Novel Pose Synthesis: "Real" represents the capture of the real robotic arm from a new viewpoint. "RoboGSim" shows the rendering of the novel pose from the new viewpoint driven by the real recorded trajectory. "Depth" shows the rendering depth by GS. "Diff" is the difference calculated between the Real and the rendered RGB images. We compute the pixel distance of the same point between the Real and RoboGSim, which is 7.37.
  • Figure 4: Sim2Real Trajectory Replay: The "Sim" row displays the video sequence collected from Isaac Sim. "Real" represents the demonstration driven by the trajectory in simulation. "RoboGSim" is the GS rendering result driven by the same trajectory. "Diff" indicates the differences between Real and the rendered results.
  • Figure 5: Novel Scene Synthesis: We show the results of the physical migration of the robot arm to new scenes, including a factory, a shelf, and two outdoor environments. The high-fidelity multi-view renderings demonstrate that RoboGSim enables the robot arm to operate seamlessly across diverse scenes.
  • ...and 4 more figures