Table of Contents
Fetching ...

TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning

Zikang Yin, Canlun Zheng, Shiliang Guo, Zhikun Wang, Shiyu Zhao

TL;DR

The paper tackles agile acrobatic MAV control under online maneuver variation, introducing TACO, a target-and-command-oriented reinforcement learning framework. TACO unifies state representation through a target-aware, task-conditioned design and optimizes a Lipschitz-constrained policy via spectral normalization to enable zero-shot sim2real transfer. The approach yields high-speed circular flight with large tilt and stable continuous flips, outperforming traditional MPC in command tracking and robustness. By combining a high-fidelity dynamics model, a structured reward, and a robust training regime, the work demonstrates practical viability for real-world aggressive MAV maneuvers and points toward broader online maneuver adaptation and generalization in aerial robotics.

Abstract

Although acrobatic flight control has been studied extensively, one key limitation of the existing methods is that they are usually restricted to specific maneuver tasks and cannot change flight pattern parameters online. In this work, we propose a target-and-command-oriented reinforcement learning (TACO) framework, which can handle different maneuver tasks in a unified way and allows online parameter changes. Additionally, we propose a spectral normalization method with input-output rescaling to enhance the policy's temporal and spatial smoothness, independence, and symmetry, thereby overcoming the sim-to-real gap. We validate the TACO approach through extensive simulation and real-world experiments, demonstrating its capability to achieve high-speed circular flights and continuous multi-flips.

TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning

TL;DR

The paper tackles agile acrobatic MAV control under online maneuver variation, introducing TACO, a target-and-command-oriented reinforcement learning framework. TACO unifies state representation through a target-aware, task-conditioned design and optimizes a Lipschitz-constrained policy via spectral normalization to enable zero-shot sim2real transfer. The approach yields high-speed circular flight with large tilt and stable continuous flips, outperforming traditional MPC in command tracking and robustness. By combining a high-fidelity dynamics model, a structured reward, and a robust training regime, the work demonstrates practical viability for real-world aggressive MAV maneuvers and points toward broader online maneuver adaptation and generalization in aerial robotics.

Abstract

Although acrobatic flight control has been studied extensively, one key limitation of the existing methods is that they are usually restricted to specific maneuver tasks and cannot change flight pattern parameters online. In this work, we propose a target-and-command-oriented reinforcement learning (TACO) framework, which can handle different maneuver tasks in a unified way and allows online parameter changes. Additionally, we propose a spectral normalization method with input-output rescaling to enhance the policy's temporal and spatial smoothness, independence, and symmetry, thereby overcoming the sim-to-real gap. We validate the TACO approach through extensive simulation and real-world experiments, demonstrating its capability to achieve high-speed circular flights and continuous multi-flips.

Paper Structure

This paper contains 20 sections, 13 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The real-world acrobatic flight trajectory based on TACO frame. (a) shows the flight trajectory of the CIRCLE task with the increasing desired speed. (b) shows the flight trajectory of the FLIP task with multi-flips.
  • Figure 2: The overall structure of the TACO framework. The higher section presents the RL training system, including the components of the TACO framework (a) and the simulation environment (b). (c) shows the relationships between the real MAV, the target status (ball in red), and the world frame. (d) and (e) show the results of the MAV executing CIRCLE and FLIP tasks in a real-world environment.
  • Figure 3: Real-world flight trajectories under different viewing angles and flight state curves in the CIRCLE task. (a) shows the 3D flight trajectory with the color indicating the time step. (b) shows the XOY 2D flight trajectory with the color indicating the speed. (c) shows the MAV's state and commands in order.
  • Figure 4: Real-world flight trajectories in the real world under different viewing angles and flight state curves in the FLIP task. (a) shows the 3D flight trajectory with the color indicating the time step. (b) shows the YOZ 2D flight trajectory with the color indicating the speed. (c) shows the MAV's state and commands in order.
  • Figure 5: Desired angular velocity output by different policies with respect to the YAW deviation in $(-\pi,\pi)$ rad. Label 'quat' represents a policy using quaternions, and 'mat' represents a policy using a rotation matrix. 'None' indicates that the Lipschitz constraint is not used, and '1' and '1.5' indicate the Lipschitz constant.
  • ...and 1 more figures