Equivariant Reinforcement Learning Frameworks for Quadrotor Low-Level Control
Beomyeol Yu, Taeyoung Lee
TL;DR
The paper addresses data efficiency and generalization in quadrotor low-level RL by leveraging geometry through group-equivariant networks. It develops two frameworks, a monolithic and a modular ERL, that encode rotational and reflectional symmetries to reduce learning dimensionality and improve sample efficiency, with theoretical guarantees that $V^*$ is $G$-invariant and $\pi^*$ is $G$-equivariant. Empirical results in simulation and indoor real-world flights show that equivariant models, especially the modular ERL (Mod-EMLP), converge faster and achieve superior tracking, including yaw control, while enabling zero-shot sim-to-real transfer. Overall, the work advances geometric RL for robotics by demonstrating practical gains in data efficiency, robustness, and transferability for quadrotor control and laying groundwork for symmetry-aware learning in other symmetric robotic systems.
Abstract
Improving sampling efficiency and generalization capability is critical for the successful data-driven control of quadrotor unmanned aerial vehicles (UAVs) that are inherently unstable. While various reinforcement learning (RL) approaches have been applied to autonomous quadrotor flight, they often require extensive training data, posing multiple challenges and safety risks in practice. To address these issues, we propose data-efficient, equivariant monolithic and modular RL frameworks for quadrotor low-level control. Specifically, by identifying the rotational and reflectional symmetries in quadrotor dynamics and encoding these symmetries into equivariant network models, we remove redundancies of learning in the state-action space. This approach enables the optimal control action learned in one configuration to automatically generalize into other configurations via symmetry, thereby enhancing data efficiency. Experimental results demonstrate that our equivariant approaches significantly outperform their non-equivariant counterparts in terms of learning efficiency and flight performance.
