Learning Agile Quadrotor Flight in the Real World
Yunfan Ren, Zhiyuan Zhu, Jiaxu Xing, Davide Scaramuzza
TL;DR
This work tackles real-world agile quadrotor control by enabling on-policy adaptation without precise system identification. It introduces Adaptive Temporal Scaling ATS to actively trade speed for safety and Online Residual Learning to capture unmodeled dynamics, coupled with Real-world Anchored Short-Horizon BPTT for efficient in-flight policy updates. The framework demonstrates rapid improvement from conservative to near-limit agility within about 100 seconds of flight, and shows robustness to hardware changes and wind disturbances, including a 42 reduction in mission time during an inspection task. By tightly integrating differentiable simulation, online residual learning, and ATS, the approach provides a practical pathway for sustained performance gains in aggressive flight regimes without offline re-identification. These results underscore real-world adaptation as a powerful mechanism to maintain high agility under evolving dynamics while preserving safety.
Abstract
Learning-based controllers have achieved impressive performance in agile quadrotor flight but typically rely on massive training in simulation, necessitating accurate system identification for effective Sim2Real transfer. However, even with precise modeling, fixed policies remain susceptible to out-of-distribution scenarios, ranging from external aerodynamic disturbances to internal hardware degradation. To ensure safety under these evolving uncertainties, such controllers are forced to operate with conservative safety margins, inherently constraining their agility outside of controlled settings. While online adaptation offers a potential remedy, safely exploring physical limits remains a critical bottleneck due to data scarcity and safety risks. To bridge this gap, we propose a self-adaptive framework that eliminates the need for precise system identification or offline Sim2Real transfer. We introduce Adaptive Temporal Scaling (ATS) to actively explore platform physical limits, and employ online residual learning to augment a simple nominal model. {Based on the learned hybrid model, we further propose Real-world Anchored Short-horizon Backpropagation Through Time (RASH-BPTT) to achieve efficient and robust in-flight policy updates. Extensive experiments demonstrate that our quadrotor reliably executes agile maneuvers near actuator saturation limits. The system evolves a conservative base policy with a peak speed of 1.9 m/s to 7.3 m/s within approximately 100 seconds of flight time. These findings underscore that real-world adaptation serves not merely to compensate for modeling errors, but as a practical mechanism for sustained performance improvement in aggressive flight regimes.
