An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation
Lu Shi, Yuxuan Xu, Shiyu Wang, Jinhao Huang, Wenhao Zhao, Yufei Jia, Zike Yan, Weibin Gu, Guyue Zhou
TL;DR
This work tackles the persistent sim-to-real gap in robotic policy transfer by introducing a Real-Sim-Real (RSR) loop that jointly tunes a differentiable simulator and retrains policies. A key contribution is the adaptive InfoGap loss, which combines a task objective with information-theoretic terms to bias data collection toward informative real-world samples and to reduce dataset bias across iterations. Implemented on MuJoCo MJX and evaluated on 6-DOF robotic manipulation tasks, the approach substantially lowers the divergence between simulated and real dynamics (as measured by distributional metrics) and yields better real-world performance and generalization. The framework offers a general, data-efficient pathway for transferring policies from simulation to real robots and can be extended to aerial robotics and other dynamic environments.
Abstract
The sim-to-real gap remains a critical challenge in robotics, hindering the deployment of algorithms trained in simulation to real-world systems. This paper introduces a novel Real-Sim-Real (RSR) loop framework leveraging differentiable simulation to address this gap by iteratively refining simulation parameters, aligning them with real-world conditions, and enabling robust and efficient policy transfer. A key contribution of our work is the design of an informative cost function that encourages the collection of diverse and representative real-world data, minimizing bias and maximizing the utility of each data point for simulation refinement. This cost function integrates seamlessly into existing reinforcement learning algorithms (e.g., PPO, SAC) and ensures a balanced exploration of critical regions in the real domain. Furthermore, our approach is implemented on the versatile Mujoco MJX platform, and our framework is compatible with a wide range of robotic systems. Experimental results on several robotic manipulation tasks demonstrate that our method significantly reduces the sim-to-real gap, achieving high task performance and generalizability across diverse scenarios of both explicit and implicit environmental uncertainties.
