Multi-Task Reinforcement Learning of Drone Aerobatics by Exploiting Geometric Symmetries

Zhanyu Guo; Zikang Yin; Guobin Zhu; Shiliang Guo; Shiyu Zhao

Multi-Task Reinforcement Learning of Drone Aerobatics by Exploiting Geometric Symmetries

Zhanyu Guo, Zikang Yin, Guobin Zhu, Shiliang Guo, Shiyu Zhao

TL;DR

GEAR addresses data-efficiency and generalization challenges in multi-task reinforcement learning for autonomous drone aerobatics by exploiting $SO(2)$ rotational symmetry in the policy architecture. It uses an $SO(2)$-equivariant actor backbone, FiLM-based task conditioning, and a multi-head critic to enable a single policy to master multiple maneuvers while maintaining task-specific learning signals. In high-fidelity simulations, GEAR achieves nearly $99\%$ success across aerobatic tasks and outperforms baseline methods; real-world experiments validate stable execution and composition of learned primitives into complex maneuvers such as Power Loop and Barrel Roll. By treating symmetry as a soft inductive bias and combining geometric priors with adaptive conditioning, the approach delivers data-efficient, robust, multi-task control for agile MAVs.

Abstract

Flight control for autonomous micro aerial vehicles (MAVs) is evolving from steady flight near equilibrium points toward more aggressive aerobatic maneuvers, such as flips, rolls, and Power Loop. Although reinforcement learning (RL) has shown great potential in these tasks, conventional RL methods often suffer from low data efficiency and limited generalization. This challenge becomes more pronounced in multi-task scenarios where a single policy is required to master multiple maneuvers. In this paper, we propose a novel end-to-end multi-task reinforcement learning framework, called GEAR (Geometric Equivariant Aerobatics Reinforcement), which fully exploits the inherent SO(2) rotational symmetry in MAV dynamics and explicitly incorporates this property into the policy network architecture. By integrating an equivariant actor network, FiLM-based task modulation, and a multi-head critic, GEAR achieves both efficiency and flexibility in learning diverse aerobatic maneuvers, enabling a data-efficient, robust, and unified framework for aerobatic control. GEAR attains a 98.85\% success rate across various aerobatic tasks, significantly outperforming baseline methods. In real-world experiments, GEAR demonstrates stable execution of multiple maneuvers and the capability to combine basic motion primitives to complete complex aerobatics.

Multi-Task Reinforcement Learning of Drone Aerobatics by Exploiting Geometric Symmetries

TL;DR

GEAR addresses data-efficiency and generalization challenges in multi-task reinforcement learning for autonomous drone aerobatics by exploiting

rotational symmetry in the policy architecture. It uses an

-equivariant actor backbone, FiLM-based task conditioning, and a multi-head critic to enable a single policy to master multiple maneuvers while maintaining task-specific learning signals. In high-fidelity simulations, GEAR achieves nearly

success across aerobatic tasks and outperforms baseline methods; real-world experiments validate stable execution and composition of learned primitives into complex maneuvers such as Power Loop and Barrel Roll. By treating symmetry as a soft inductive bias and combining geometric priors with adaptive conditioning, the approach delivers data-efficient, robust, multi-task control for agile MAVs.

Abstract

Paper Structure (30 sections, 14 equations, 3 figures, 1 table)

This paper contains 30 sections, 14 equations, 3 figures, 1 table.

Introduction
Related Work
MAVs' Acrobatic Maneuvers
Multi-Task Reinforcement Learning for Robotics
Equivariant Learning
Problem Formulation
MAV Dynamics
State Design
Reward Function Design
Basic tracking terms
Command adherence
Task-specific shaping
SO(2)-Symmetry and Equivariant Structure
Group action and representations
Equivariance of dynamics
...and 15 more sections

Figures (3)

Figure 1: Overview of the proposed GEAR framework. (a) Training structure: The policy consists of an actor and a multi-head critic, both combining FiLM layers with an Equivariant MLP (EMLP). The actor receives state(with added noise during training), command, and history action inputs and outputs low-level control actions. The critic provides task-specific value estimates, which are used only in training and discarded at deployment. (b) FiLM layer: commands generate scaling and shifting factors ($\gamma$, $\beta$) to modulate intermediate features, enabling task conditioning and controlled symmetry breaking. (c) Group representation: state features are partitioned into irreducible and trivial components.
Figure 2: Training curves of EMLP- and MLP-based frameworks on four tasks (Flip, Hover, Roll, and Rotate). Results are from a single joint training run of one policy across all tasks. Each curve represents the average reward over 10 random seeds, with shaded regions indicating the $95\%$ confidence interval.
Figure 3: Demonstration of real-world acrobatic maneuvers. Arrows denote the direction of motion. For the Roll maneuver, five consecutive images are concatenated to highlight its nearly constant altitude. Red markers indicate the timing of issued commands.

Multi-Task Reinforcement Learning of Drone Aerobatics by Exploiting Geometric Symmetries

TL;DR

Abstract

Multi-Task Reinforcement Learning of Drone Aerobatics by Exploiting Geometric Symmetries

Authors

TL;DR

Abstract

Table of Contents

Figures (3)