Sim-Anchored Learning for On-the-Fly Adaptation
Bassel El Mabsout, Shahin Roozkhosh, Siddharth Mysore, Kate Saenko, Renato Mancuso
TL;DR
This work tackles the sim-to-real transfer problem by introducing anchor critics that preserve the simulation-designed priority profile during real-world adaptation. It frames live adaptation as a multi-objective optimization between a source-domain anchor $Q_\Psi$ and a target-domain $Q_\pi$, combining them via a geometric-mean conjunction $J_{\pi_\theta}^{\Psi}$ within Fulfillment Priority Logic. Through sim-to-sim and real-robot experiments—particularly with a quadrotor using the SwaNNFlight stack—the approach demonstrates robust retention of intended behaviors while achieving substantial power savings and smoother control; open-source firmware and tooling are provided to enable on-the-fly adaptation on similar platforms. The key contributions include the anchor-critic formulation, a detailed experimental validation across simulation and real hardware, and the SwaNNFlight/SwaNNLake infrastructure for live policy updates. The results indicate that anchoring adaptation to simulation intent can mitigate catastrophic forgetting and deliver practical improvements in safety, efficiency, and robustness for real-time robotic control.
Abstract
Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions. We argue that designer priorities exist not just in reward functions, but also in simulation design choices like task selection and state initialization. When adapting to real-world data, agents can experience catastrophic forgetting in important but underrepresented scenarios. We propose framing live-adaptation as a multi-objective optimization problem, where policy objectives must be satisfied both in simulation and reality. Our approach leverages critics from simulation as "anchors for design intent" (anchor critics). By jointly optimizing policies against both anchor critics and critics trained on real-world experience, our method enables adaptation while preserving prioritized behaviors from simulation. Evaluations demonstrate robust behavior retention in sim-to-sim benchmarks and a sim-to-real scenario with a racing quadrotor, allowing for power consumption reductions of up to 50% without control loss. We also contribute SwaNNFlight, an open-source firmware for enabling live adaptation on similar robotic platforms.
