Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting

Jian Tang; Pu Pang; Haowen Sun; Chengzhong Ma; Xingyu Chen; Hua Huang; Xuguang Lan

Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting

Jian Tang, Pu Pang, Haowen Sun, Chengzhong Ma, Xingyu Chen, Hua Huang, Xuguang Lan

TL;DR

This work tackles the persistent sim-to-real gap in robotic manipulation by introducing Semantic 2D Gaussian Splatting (S2GS), a representation that yields object-centric, domain-invariant spatial features from multi-view data. S2GS combines 3D/2D Gaussian splatting with hierarchical semantic extraction, semantic feature rendering, and dynamic scene updating to provide clean inputs for a diffusion-based policy, significantly improving transfer from ManiSkill simulation to real UR5 robots. The approach demonstrates robust cross-domain performance, achieving high success rates in real-world manipulation tasks and outperforming RGB-based baselines and 3D Gaussian methods in both appearance fidelity and transfer reliability. The work offers a practical, real-time, and editable representation that reduces engineering effort for sim-to-real transfer and highlights future work to incorporate additional domain-invariant cues such as surface normals.

Abstract

Cross-domain transfer in robotic manipulation remains a longstanding challenge due to the significant domain gap between simulated and real-world environments. Existing methods such as domain randomization, adaptation, and sim-real calibration often require extensive tuning or fail to generalize to unseen scenarios. To address this issue, we observe that if domain-invariant features are utilized during policy training in simulation, and the same features can be extracted and provided as the input to policy during real-world deployment, the domain gap can be effectively bridged, leading to significantly improved policy generalization. Accordingly, we propose Semantic 2D Gaussian Splatting (S2GS), a novel representation method that extracts object-centric, domain-invariant spatial features. S2GS constructs multi-view 2D semantic fields and projects them into a unified 3D space via feature-level Gaussian splatting. A semantic filtering mechanism removes irrelevant background content, ensuring clean and consistent inputs for policy learning. To evaluate the effectiveness of S2GS, we adopt Diffusion Policy as the downstream learning algorithm and conduct experiments in the ManiSkill simulation environment, followed by real-world deployment. Results demonstrate that S2GS significantly improves sim-to-real transferability, maintaining high and stable task performance in real-world scenarios.

Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting

TL;DR

Abstract

Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)