SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
Yarden As, Chengrui Qu, Benjamin Unger, Dongho Kang, Max van der Hart, Laixi Shi, Stelian Coros, Adam Wierman, Andreas Krause
TL;DR
The paper tackles zero-shot safe transfer from simulation to real robots by addressing the sim-to-real safety gap. It introduces SPiDR, a pessimistic domain-randomization approach that augments the CMDP framework with a penalized cost, approximated in practice by ensemble disagreement to bound the real-world cost under model mismatch measured by the $L_1$-Wasserstein distance $D_W(\hat p_\xi, p^\star)(s,a)$. The key theoretical result shows that solving the penalized CMDP yields a policy satisfying the real-world safety constraint $C_{p^\star}(\pi) \le d$; SPiDR remains compatible with standard RL pipelines and scales to sim-to-sim and real-world tasks, including vision-based control. Empirically, SPiDR demonstrates safe zero-shot transfer on two real robotic platforms (Race Car and Unitree Go1) and strong performance across sim-to-sim benchmarks, with ablations highlighting robustness to the penalty parameter and ensemble size.
Abstract
Deploying reinforcement learning (RL) safely in the real world is challenging, as policies trained in simulators must face the inevitable sim-to-real gap. Robust safe RL techniques are provably safe, however difficult to scale, while domain randomization is more practical yet prone to unsafe behaviors. We address this gap by proposing SPiDR, short for Sim-to-real via Pessimistic Domain Randomization -- a scalable algorithm with provable guarantees for safe sim-to-real transfer. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines. Through extensive experiments on sim-to-sim benchmarks and two distinct real-world robotic platforms, we demonstrate that SPiDR effectively ensures safety despite the sim-to-real gap while maintaining strong performance.
