TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning
Zikang Yin, Canlun Zheng, Shiliang Guo, Zhikun Wang, Shiyu Zhao
TL;DR
The paper tackles agile acrobatic MAV control under online maneuver variation, introducing TACO, a target-and-command-oriented reinforcement learning framework. TACO unifies state representation through a target-aware, task-conditioned design and optimizes a Lipschitz-constrained policy via spectral normalization to enable zero-shot sim2real transfer. The approach yields high-speed circular flight with large tilt and stable continuous flips, outperforming traditional MPC in command tracking and robustness. By combining a high-fidelity dynamics model, a structured reward, and a robust training regime, the work demonstrates practical viability for real-world aggressive MAV maneuvers and points toward broader online maneuver adaptation and generalization in aerial robotics.
Abstract
Although acrobatic flight control has been studied extensively, one key limitation of the existing methods is that they are usually restricted to specific maneuver tasks and cannot change flight pattern parameters online. In this work, we propose a target-and-command-oriented reinforcement learning (TACO) framework, which can handle different maneuver tasks in a unified way and allows online parameter changes. Additionally, we propose a spectral normalization method with input-output rescaling to enhance the policy's temporal and spatial smoothness, independence, and symmetry, thereby overcoming the sim-to-real gap. We validate the TACO approach through extensive simulation and real-world experiments, demonstrating its capability to achieve high-speed circular flights and continuous multi-flips.
