Table of Contents
Fetching ...

Neural Fidelity Calibration for Informative Sim-to-Real Adaptation

Youwei Yu, Lantao Liu

TL;DR

This work tackles the sim-to-real gap by introducing Neural Fidelity Calibration (NFC), a diffusion-model based framework that online-calibrates simulator physics and residual fidelity, including perception uncertainty. By coupling anomaly-driven policy fine-tuning, sequential NFC with a proposal prior, and optimistic exploration, NFC enables informative, data-efficient policy adaptation across diverse robots and challenging real-world conditions, such as a broken wheel axle on snow. Key contributions include a diffusion-based neural posterior for both calibration and residuals, a residual-fidelity model that captures dynamics and perception shifts, and a principled integration with anomaly detection and Hallucinated randomness to guide safe online learning. The approach demonstrates superior calibration accuracy and policy improvement in both sim-to-sim and real-world experiments, offering a practical path to robust, real-time sim-to-real adaptation without extensive expert physics priors.

Abstract

Deep reinforcement learning can seamlessly transfer agile locomotion and navigation skills from the simulator to real world. However, bridging the sim-to-real gap with domain randomization or adversarial methods often demands expert physics knowledge to ensure policy robustness. Even so, cutting-edge simulators may fall short of capturing every real-world detail, and the reconstructed environment may introduce errors due to various perception uncertainties. To address these challenges, we propose Neural Fidelity Calibration (NFC), a novel framework that employs conditional score-based diffusion models to calibrate simulator physical coefficients and residual fidelity domains online during robot execution. Specifically, the residual fidelity reflects the simulation model shift relative to the real-world dynamics and captures the uncertainty of the perceived environment, enabling us to sample realistic environments under the inferred distribution for policy fine-tuning. Our framework is informative and adaptive in three key ways: (a) we fine-tune the pretrained policy only under anomalous scenarios, (b) we build sequential NFC online with the pretrained NFC's proposal prior, reducing the diffusion model's training burden, and (c) when NFC uncertainty is high and may degrade policy improvement, we leverage optimistic exploration to enable hallucinated policy optimization. Our framework achieves superior simulator calibration precision compared to state-of-the-art methods across diverse robots with high-dimensional parametric spaces. We study the critical contribution of residual fidelity to policy improvement in simulation and real-world experiments. Notably, our approach demonstrates robust robot navigation under challenging real-world conditions, such as a broken wheel axle on snowy surfaces.

Neural Fidelity Calibration for Informative Sim-to-Real Adaptation

TL;DR

This work tackles the sim-to-real gap by introducing Neural Fidelity Calibration (NFC), a diffusion-model based framework that online-calibrates simulator physics and residual fidelity, including perception uncertainty. By coupling anomaly-driven policy fine-tuning, sequential NFC with a proposal prior, and optimistic exploration, NFC enables informative, data-efficient policy adaptation across diverse robots and challenging real-world conditions, such as a broken wheel axle on snow. Key contributions include a diffusion-based neural posterior for both calibration and residuals, a residual-fidelity model that captures dynamics and perception shifts, and a principled integration with anomaly detection and Hallucinated randomness to guide safe online learning. The approach demonstrates superior calibration accuracy and policy improvement in both sim-to-sim and real-world experiments, offering a practical path to robust, real-time sim-to-real adaptation without extensive expert physics priors.

Abstract

Deep reinforcement learning can seamlessly transfer agile locomotion and navigation skills from the simulator to real world. However, bridging the sim-to-real gap with domain randomization or adversarial methods often demands expert physics knowledge to ensure policy robustness. Even so, cutting-edge simulators may fall short of capturing every real-world detail, and the reconstructed environment may introduce errors due to various perception uncertainties. To address these challenges, we propose Neural Fidelity Calibration (NFC), a novel framework that employs conditional score-based diffusion models to calibrate simulator physical coefficients and residual fidelity domains online during robot execution. Specifically, the residual fidelity reflects the simulation model shift relative to the real-world dynamics and captures the uncertainty of the perceived environment, enabling us to sample realistic environments under the inferred distribution for policy fine-tuning. Our framework is informative and adaptive in three key ways: (a) we fine-tune the pretrained policy only under anomalous scenarios, (b) we build sequential NFC online with the pretrained NFC's proposal prior, reducing the diffusion model's training burden, and (c) when NFC uncertainty is high and may degrade policy improvement, we leverage optimistic exploration to enable hallucinated policy optimization. Our framework achieves superior simulator calibration precision compared to state-of-the-art methods across diverse robots with high-dimensional parametric spaces. We study the critical contribution of residual fidelity to policy improvement in simulation and real-world experiments. Notably, our approach demonstrates robust robot navigation under challenging real-world conditions, such as a broken wheel axle on snowy surfaces.

Paper Structure

This paper contains 35 sections, 21 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Neural (diffusion-model) Fidelity (left-figure) Calibration (middle-figure), NFC, enables informative (anomaly-detection) sim-to-real (right-figure) policy transfer. Residual fidelity identifies dynamics shift from sim-to-real and residual environment from perception uncertainty. Simulator calibration finds suitable physical coefficients to match the real-world trajectory. In the left figure, colored trajectories represent those in the calibrated simulator, while the black trajectory corresponds to the real-world execution. The difference between them indicates the sim-to-real residual dynamics. The circled area on the elevated terrain highlights regions with uncertain perceptions, where our NFC samples the residual environment—the difference between the ground-truth terrain elevation and the perceived elevation—and reconstructs multiple terrain variations in simulation to fine-tune the policy. RL policy and NFC, initialized in simulation, are only fine-tuned under anomaly situations.
  • Figure 2: Neural Fidelity Randomization enables sampling of high-dimensional residual environment geometries by accounting for perception uncertainty while preserving realism. As the (normalized) variance increases, the generated samples exhibit greater diversity in appearance.
  • Figure 3: Simulator calibration posterior across five robots is compared between our method and competing approaches in Flat (empty-world) and Rough (unstructured) environments, with the ground-truth values marked by red crosses. The residual fidelity posterior, which includes residual dynamics (position $x$ and $y$) and residual environment (height in $z$), is only shown for our method due to the poor performance of other approaches. Simulator calibration parameters: [Ant] torso mass v.s. left back foot mass, [Quadruped] rear front shank mass v.s. right rear shank mass, [Humanoid] torso mass v.s. right hip stiffness, [Quadcopter] first rotor mass v.s. rotor third arm mass, [Jackal] rear right wheel damping v.s. front left wheel restitution.
  • Figure 4: Real-world experiment environments with various surfaces and physical properties. The vehicle's left front axle was broken as an anomalous situation, whereas the additional rock posed challenging inertia force.
  • Figure 5: Our Neural Fidelity infers the residual environment $\{\Delta \boldsymbol{e}\}^{N}$ based on the uncertain onboard elevation mapping $\boldsymbol{e}$ in the real world. The right side shows $N = 8$ reconstructed environments, $\hat{\boldsymbol{e}} = \boldsymbol{e} + \Delta \boldsymbol{e}$, in simulation to fine-tune the policy.
  • ...and 1 more figures