Table of Contents
Fetching ...

Learning Agile Gate Traversal via Analytical Optimal Policy Gradient

Tianchen Sun, Bingheng Wang, Nuthasith Gerdpratoom, Longbin Tang, Yichao Gao, Lin Zhao

TL;DR

A novel hybrid framework that adaptively fine-tunes model predictive control parameters online using outputs from a neural network trained offline using outputs from a neural network trained offline is presented.

Abstract

Traversing narrow gates presents a significant challenge and has become a standard benchmark for evaluating agile and precise quadrotor flight. Traditional modularized autonomous flight stacks require extensive design and parameter tuning, while end-to-end reinforcement learning (RL) methods often suffer from low sample efficiency, limited interpretability, and degraded disturbance rejection under unseen perturbations. In this work, we present a novel hybrid framework that adaptively fine-tunes model predictive control (MPC) parameters online using outputs from a neural network (NN) trained offline. The NN jointly predicts a reference pose and cost function weights, conditioned on the coordinates of the gate corners and the current drone state. To achieve efficient training, we derive analytical policy gradients not only for the MPC module but also for an optimization-based gate traversal detection module. Hardware experiments demonstrate agile and accurate gate traversal with peak accelerations of $30\ \mathrm{m/s^2}$, as well as recovery within $0.85\ \mathrm{s}$ following body-rate disturbances exceeding $1146\ \mathrm{deg/s}$.

Learning Agile Gate Traversal via Analytical Optimal Policy Gradient

TL;DR

A novel hybrid framework that adaptively fine-tunes model predictive control parameters online using outputs from a neural network trained offline using outputs from a neural network trained offline is presented.

Abstract

Traversing narrow gates presents a significant challenge and has become a standard benchmark for evaluating agile and precise quadrotor flight. Traditional modularized autonomous flight stacks require extensive design and parameter tuning, while end-to-end reinforcement learning (RL) methods often suffer from low sample efficiency, limited interpretability, and degraded disturbance rejection under unseen perturbations. In this work, we present a novel hybrid framework that adaptively fine-tunes model predictive control (MPC) parameters online using outputs from a neural network (NN) trained offline. The NN jointly predicts a reference pose and cost function weights, conditioned on the coordinates of the gate corners and the current drone state. To achieve efficient training, we derive analytical policy gradients not only for the MPC module but also for an optimization-based gate traversal detection module. Hardware experiments demonstrate agile and accurate gate traversal with peak accelerations of , as well as recovery within following body-rate disturbances exceeding .

Paper Structure

This paper contains 21 sections, 20 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Multiple real flight demonstrations with gate angle ranging from $30^{\circ}$ to $70^{\circ}$. The trajectory of the quadrotor is illustrated through composited images generated from sequential snapshots. The gate orientations from subfigures A to F are $30^{\circ}, \ -45^{\circ}, \ 45^{\circ}, \ -65^{\circ}, \ 60^{\circ}$, and $-70^{\circ}$, respectively.
  • Figure 2: In this framework, the inference pass features two nested closed-loop feedback structures. In the outer closed loop, a NN predicts both the reference pose $\mathbf{T}_{\text{ref}}$ and the cost-terms weights $\mathbf{Q}$, $\gamma$, based on the observed gate corner positions, the goal position, and the current state of the quadrotor $\mathbf{x}$. In the inner closed loop, the MPC module optimizes the predicted state and control trajectory, $\mathbf{\xi}_i$, and only the first control input, $\mathbf{u} = [f_r, {^{\mathcal{B}}{{\boldsymbol{\omega}}}}]$, is applied to the quadrotor. During training, a high-level loss is imposed by constructing a differentiable gate collision detection problem to evaluate the MPC predicted trajectory. By composing the gradients from both the differentiable MPC and the differentiable gate collision detection module, an analytical optimal policy gradient is obtained to update the NN.
  • Figure 3: Evaluation results of 128 trials before and after training, showing a success rate of 9.38% and 80.46%, respectively.
  • Figure 4: Illustration of real flight data during traversal of a gate tilted at $-65^{\circ}$. In real time, the neural network (NN) provides both a high-level reference pose (depicted by coordinate frames) and adaptive cost-term weights for MPC adjustment. The predicted trajectories from MPC and NN-predicted poses at $t=0.72 \ \text{s},0.84 \ \text{s}, \text{and}\ 0.96 \ \text{s}$ are plotted. The complete flight trajectory is captured in Fig. \ref{['fig: real flight photos']} D.
  • Figure 5: Trajectories of the quadrotor state (dashed) and the NN predicted high-level decision variables (solid) in a real flight. NN predicted position and orientation are dashed lines. $\phi, \theta,\psi$ are the euler angle. The vertical dashed red line indicates the traversal time.
  • ...and 3 more figures