Learning Agile Quadrotor Flight in the Real World

Yunfan Ren; Zhiyuan Zhu; Jiaxu Xing; Davide Scaramuzza

Learning Agile Quadrotor Flight in the Real World

Yunfan Ren, Zhiyuan Zhu, Jiaxu Xing, Davide Scaramuzza

TL;DR

This work tackles real-world agile quadrotor control by enabling on-policy adaptation without precise system identification. It introduces Adaptive Temporal Scaling ATS to actively trade speed for safety and Online Residual Learning to capture unmodeled dynamics, coupled with Real-world Anchored Short-Horizon BPTT for efficient in-flight policy updates. The framework demonstrates rapid improvement from conservative to near-limit agility within about 100 seconds of flight, and shows robustness to hardware changes and wind disturbances, including a 42 reduction in mission time during an inspection task. By tightly integrating differentiable simulation, online residual learning, and ATS, the approach provides a practical pathway for sustained performance gains in aggressive flight regimes without offline re-identification. These results underscore real-world adaptation as a powerful mechanism to maintain high agility under evolving dynamics while preserving safety.

Abstract

Learning-based controllers have achieved impressive performance in agile quadrotor flight but typically rely on massive training in simulation, necessitating accurate system identification for effective Sim2Real transfer. However, even with precise modeling, fixed policies remain susceptible to out-of-distribution scenarios, ranging from external aerodynamic disturbances to internal hardware degradation. To ensure safety under these evolving uncertainties, such controllers are forced to operate with conservative safety margins, inherently constraining their agility outside of controlled settings. While online adaptation offers a potential remedy, safely exploring physical limits remains a critical bottleneck due to data scarcity and safety risks. To bridge this gap, we propose a self-adaptive framework that eliminates the need for precise system identification or offline Sim2Real transfer. We introduce Adaptive Temporal Scaling (ATS) to actively explore platform physical limits, and employ online residual learning to augment a simple nominal model. {Based on the learned hybrid model, we further propose Real-world Anchored Short-horizon Backpropagation Through Time (RASH-BPTT) to achieve efficient and robust in-flight policy updates. Extensive experiments demonstrate that our quadrotor reliably executes agile maneuvers near actuator saturation limits. The system evolves a conservative base policy with a peak speed of 1.9 m/s to 7.3 m/s within approximately 100 seconds of flight time. These findings underscore that real-world adaptation serves not merely to compensate for modeling errors, but as a practical mechanism for sustained performance improvement in aggressive flight regimes.

Learning Agile Quadrotor Flight in the Real World

TL;DR

Abstract

Paper Structure (24 sections, 15 equations, 8 figures)

This paper contains 24 sections, 15 equations, 8 figures.

Introduction
Related Work
Learning-based Agile Flight Control
Efficient Control Policy Learning
Methodology
Online Residual Dynamics Learning
Hybrid Continuous-Time Dynamics
Differentiable Integration and Online Training
Real-World Anchored Short-Horizon BPTT
Adaptive Temporal Scaling via Differentiable Optimization
Parameterization
Optimization Objective and Update
Results and Experiments
Real-World Agile Flight and Adaptation
Pushing the Physical Limits (RQ1)
...and 9 more sections

Figures (8)

Figure 1: Overview of the self-adaptive autonomous flight framework. The system operates as a continuous closed-loop cycle (bottom right) bridging physical execution and differentiable simulation: (A) Policy Learning: Leveraging a learned hybrid dynamics model in a differentiable simulator, we perform RASH-BPTT to optimize the control policy via massively parallelized rollouts. (B) Real-World Rollout: The agent executes the current policy on the physical quadrotor to collect state-action-transition data. (C) Model Calibration: Collected data is used to update the Hybrid Dynamics Model online, where a neural residual network learns to compensate for the reality gap (e.g., unmodeled aerodynamics, delays) of the nominal rigid-body model. (D) Anchored Initialization: To mitigate compounding prediction errors, simulation rollouts are initialized (anchored) using the most recent real-world state estimates rather than random resets. (E) Adaptive Temporal Scaling (ATS): Tightly coupled with policy optimization, the trajectory time-scale $\alpha$ is jointly optimized based on real-world rollouts. Leveraging analytical gradients derived from closed-loop sensitivity, it maximizes agility (speed) while enforcing safety constraints via a barrier function.
Figure 2: The optimization landscape for ATS. The heatmap visualizes the composite potential $\mathcal{J}_{\text{ATS}}$, balancing agility (low $\alpha$) against safety. The landscape transitions from the Safe Zone (blue) to the Unsafe Zone (red) determined by the tracking error threshold $\mathcal{E}_{\text{th}}$ (dotted line) and a schematic representation of the system's physical limits (dashed curve). The green dot marks the optimal equilibrium: the most aggressive time scale achievable within safe tracking bounds.
Figure 3: Experimental platform with extreme modifications. To validate robustness, the nominal quadrotor (192 g) is subjected to drastic degradations: mechanically clipped propellers (inducing aerodynamic loss) and a 60 g payload. This 31% mass increase significantly alters the inertial properties and reduces the thrust-to-weight ratio.
Figure 4: Commanded body-rate $\bm{\omega}_{\mathrm{cmd}}$ during real-world Figure-8 and Line-Shuttle flights. Solid curves denote the 2 s sliding-window mean, overlaid on raw measurements (background traces). The orange dashed line marks the actuation limit ($6$ rad/s).
Figure 5: Adaptation to hardware variations. The framework is tested with i) added mass, ii) Propeller Damage, and iii) combined conditions. In all cases, the residual learning module swiftly compensates for the dynamic mismatch within one iteration, enabling the ATS to safely push the compromised hardware to its new physical limits.
...and 3 more figures

Learning Agile Quadrotor Flight in the Real World

TL;DR

Abstract

Learning Agile Quadrotor Flight in the Real World

Authors

TL;DR

Abstract

Table of Contents

Figures (8)