RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications
Xingyu Liu, Chenyangguang Zhang, Gu Wang, Ruida Zhang, Xiangyang Ji
TL;DR
RaSim tackles the depth-domain sim-to-real gap by simulating RealSense D400-style depth sensors and introducing a range-aware rendering strategy that leverages near IR and far RGB cues. It builds a large-scale, photorealistic synthetic RGB-D dataset and trains SDRNet to restore ground-truth depth, while also pre-training depth branches of Transformer backbones to boost real-world tasks. Across depth completion on ClearGrasp and depth-based pose estimation on YCB-V, models trained solely on RaSim achieve competitive or superior performance without finetuning, demonstrating strong cross-domain transfer. The work highlights the practical impact of depth-focused synthetic data for real-world RGB-D perception and points toward expanding RaSim to additional sensors and applications.
Abstract
In robotic vision, a de-facto paradigm is to learn in simulated environments and then transfer to real-world applications, which poses an essential challenge in bridging the sim-to-real domain gap. While mainstream works tackle this problem in the RGB domain, we focus on depth data synthesis and develop a range-aware RGB-D data simulation pipeline (RaSim). In particular, high-fidelity depth data is generated by imitating the imaging principle of real-world sensors. A range-aware rendering strategy is further introduced to enrich data diversity. Extensive experiments show that models trained with RaSim can be directly applied to real-world scenarios without any finetuning and excel at downstream RGB-D perception tasks.
