Table of Contents
Fetching ...

Equivariant Reinforcement Learning Frameworks for Quadrotor Low-Level Control

Beomyeol Yu, Taeyoung Lee

TL;DR

The paper addresses data efficiency and generalization in quadrotor low-level RL by leveraging geometry through group-equivariant networks. It develops two frameworks, a monolithic and a modular ERL, that encode rotational and reflectional symmetries to reduce learning dimensionality and improve sample efficiency, with theoretical guarantees that $V^*$ is $G$-invariant and $\pi^*$ is $G$-equivariant. Empirical results in simulation and indoor real-world flights show that equivariant models, especially the modular ERL (Mod-EMLP), converge faster and achieve superior tracking, including yaw control, while enabling zero-shot sim-to-real transfer. Overall, the work advances geometric RL for robotics by demonstrating practical gains in data efficiency, robustness, and transferability for quadrotor control and laying groundwork for symmetry-aware learning in other symmetric robotic systems.

Abstract

Improving sampling efficiency and generalization capability is critical for the successful data-driven control of quadrotor unmanned aerial vehicles (UAVs) that are inherently unstable. While various reinforcement learning (RL) approaches have been applied to autonomous quadrotor flight, they often require extensive training data, posing multiple challenges and safety risks in practice. To address these issues, we propose data-efficient, equivariant monolithic and modular RL frameworks for quadrotor low-level control. Specifically, by identifying the rotational and reflectional symmetries in quadrotor dynamics and encoding these symmetries into equivariant network models, we remove redundancies of learning in the state-action space. This approach enables the optimal control action learned in one configuration to automatically generalize into other configurations via symmetry, thereby enhancing data efficiency. Experimental results demonstrate that our equivariant approaches significantly outperform their non-equivariant counterparts in terms of learning efficiency and flight performance.

Equivariant Reinforcement Learning Frameworks for Quadrotor Low-Level Control

TL;DR

The paper addresses data efficiency and generalization in quadrotor low-level RL by leveraging geometry through group-equivariant networks. It develops two frameworks, a monolithic and a modular ERL, that encode rotational and reflectional symmetries to reduce learning dimensionality and improve sample efficiency, with theoretical guarantees that is -invariant and is -equivariant. Empirical results in simulation and indoor real-world flights show that equivariant models, especially the modular ERL (Mod-EMLP), converge faster and achieve superior tracking, including yaw control, while enabling zero-shot sim-to-real transfer. Overall, the work advances geometric RL for robotics by demonstrating practical gains in data efficiency, robustness, and transferability for quadrotor control and laying groundwork for symmetry-aware learning in other symmetric robotic systems.

Abstract

Improving sampling efficiency and generalization capability is critical for the successful data-driven control of quadrotor unmanned aerial vehicles (UAVs) that are inherently unstable. While various reinforcement learning (RL) approaches have been applied to autonomous quadrotor flight, they often require extensive training data, posing multiple challenges and safety risks in practice. To address these issues, we propose data-efficient, equivariant monolithic and modular RL frameworks for quadrotor low-level control. Specifically, by identifying the rotational and reflectional symmetries in quadrotor dynamics and encoding these symmetries into equivariant network models, we remove redundancies of learning in the state-action space. This approach enables the optimal control action learned in one configuration to automatically generalize into other configurations via symmetry, thereby enhancing data efficiency. Experimental results demonstrate that our equivariant approaches significantly outperform their non-equivariant counterparts in terms of learning efficiency and flight performance.

Paper Structure

This paper contains 20 sections, 4 theorems, 48 equations, 8 figures, 3 tables.

Key Result

Proposition 1

Let the equations of motion eqn:x_dot--eqn:W_dot be consolidated into where $F:\mathcal{S}_\textrm{mono}\times \mathcal{A}_\textrm{mono}\rightarrow T\mathcal{S}_\textrm{mono}$, and $T\mathcal{S}_\textrm{mono}$ denotes the tangent bundle of the state space. Then $F$ is equivariant with respect to the action defined in eqn:gs_mono and eqn:ga_mono, i.e., where the action on $T\mathcal{S}_\textrm{mo

Figures (8)

  • Figure 1: Illustration of the group action corresponding to the rotation about the vertical axis $\vec{e}_3$. A group element $g \in \mathsf{SO(3)}_{\vec{e}_3}$ (green), which corresponds to $\mathsf{SO(2)}$ embedded in $\mathsf{SO(3)}$ as a subgroup by fixing the axis of rotation as $\vec{e}_3$, acts on a quadrotor state $s$ (purple) by rotating it to a new state $g \cdot s$ (blue). The orbit of $s$ under this action, denoted by $G \cdot s$, is the set of all points reachable by rotations about $\vec{e}_3$, and its projection on the position space corresponds to a circle (red). In the proposed equivariant RL, the control policy learned at a single point on the orbit is automatically generalized to any other points on the orbit.
  • Figure 2: A schematic overview of the system. a., During training, we train RL policies for quadrotor low-level control tasks in simulation. (a) A custom simulator serves as a training environment, providing full access to the quadrotor's dynamics and state. (b) A monolithic end-to-end policy directly outputs total thrust $f$ and moments $M$. (c) Two specialized modules independently control translational and yaw motions, each selecting the optimal action based on its local observations. b., When transferring trained policies from simulation to the physical world, the sim-to-real gap arises from mismatches between simulation and reality. To bridge this gap, domain randomization is applied during the training phase. (d) An indoor flight test facility at the Flight Dynamics and Control Lab, GWU for real-world deployment. A supplementary video of the RL training and real-world experiments is available at https://youtu.be/TGBQTuKpbAw.
  • Figure 3: Illustration of the group action corresponding to the reflection symmetry with respect to the finite cyclic subgroup $G = \{ 1, -1 \}$. The reflectional action $g \in G$ transforms the original state-action pair $(s, a)$ (left) into the reflected pair $g(s, a)$ (right), preserving the symmetry of the yaw error dynamics.
  • Figure 4: Illustration of equivariant actors and invariant critics.
  • Figure 5: Benchmarks of four RL frameworks trained with (a) PPO, (b) TD3, and (c) SAC RL algorithms. Each plot depicts the learning curves for Mono-MLP (green), Mono-EMLP (blue), Mod-MLP (orange), and Mod-EMLP (red).
  • ...and 3 more figures

Theorems & Definitions (9)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Remark 1