Multi-Agent Reinforcement Learning for the Low-Level Control of a Quadrotor UAV

Beomyeol Yu; Taeyoung Lee

Multi-Agent Reinforcement Learning for the Low-Level Control of a Quadrotor UAV

Beomyeol Yu, Taeyoung Lee

TL;DR

The paper tackles the challenge of low-level quadrotor control by addressing yaw-translation coupling and data-efficiency concerns in single-agent RL. It introduces two MARL frameworks that decouple translational and yaw dynamics across dedicated agents, with regularization and integral terms to improve stability and mitigate steady-state errors. The authors demonstrate faster convergence and more robust performance for both DTDE and CTDE compared to SARL, with CTDE offering better coordination through centralized critics. Sim-to-sim experiments show that the decoupled MARL approaches achieve superior tracking and stability, highlighting potential for real-world deployment with domain randomization and future sim-to-real transfer work.

Abstract

By leveraging the underlying structures of the quadrotor dynamics, we propose multi-agent reinforcement learning frameworks to innovate the low-level control of a quadrotor, where independent agents operate cooperatively to achieve a common goal. While single-agent reinforcement learning has been successfully applied in quadrotor controls, training a large monolithic network is often data-intensive and time-consuming. Moreover, achieving agile yawing control remains a significant challenge due to the strongly coupled nature of the quadrotor dynamics. To address this, we decompose the quadrotor dynamics into translational and yawing components and assign collaborative reinforcement learning agents to each part to facilitate more efficient training. Additionally, we introduce regularization terms to mitigate steady-state errors and prevent excessive maneuvers. Benchmark studies, including sim-to-sim transfer verification, demonstrate that our proposed training schemes substantially improve the convergence rate of training, while enhancing flight control performance and stability compared to traditional single-agent approaches.

Multi-Agent Reinforcement Learning for the Low-Level Control of a Quadrotor UAV

TL;DR

Abstract

Paper Structure (17 sections, 16 equations, 5 figures, 3 tables)

This paper contains 17 sections, 16 equations, 5 figures, 3 tables.

Introduction
Related work
Backgrounds
Single‑Agent Reinforcement Learning
Multi‑Agent Reinforcement Learning
Quadrotor Dynamics
Decoupled Yaw Control System
Multi-Agent RL for Quadrotor Control
Single-Agent RL Framework
Multi-Agent RL Frameworks
Policy Regularization
Integral Terms for Steady-State Error
Numerical Experiments
Implementation
Benchmark Results
...and 2 more sections

Figures (5)

Figure 1: MARL structure for the low-level control of a quadrotor: Agent $\#$1 controls the roll/pitch dynamics and the position, and Agent $\#$2 controls the yaw. Each agent receives individual observations $(o_1, o_2)$ and rewards $(r_1, r_2)$. They select the optimal joint action $(a^*_1, a^*_2)$ with their policy networks, which is converted into motor thrust $T_{1:4}$ by the mixer in \ref{['eqn:mixer']}.
Figure 2: Three different training schemes in single and multi-agent settings for quadrotor control tasks. (left) Single-Agent Reinforcement Learning (SARL) holds a large end-to-end policy that directly outputs motor thrust. (middle) In Decentralized Training with Decentralized Execution (DTDE) setting, each agent updates its own policy without explicit information exchange with each other. (right) Centralized Training with Decentralized Execution (CTDE) allows agents to share experience during training, but each action is executed decentrally based on their local observations and policies.
Figure 3: Simulation environments: (left) simplified custom environment for training. (right) real-time physics environment, gym-pybullet-dronespanerati2021learning, for sim-to-sim verification for each scheme of SARL (green), DTDE (blue), and CTDE (red).
Figure 4: Learning curves for each framework over timesteps. The solid lines and shaded areas represent the average and standard deviation of performance, respectively.
Figure 5: Flight performance comparison of SARL (green), DTDE (blue), and CTDE (red) in a single episode when $e_{b_1}(0) = 150^{\circ}$ under the same initial conditions. (a) simulation in the training environment (b) simulation in the PyBullet-based physics environment, which represents the sim-to-sim transfer. Each column visualizes (left) position error, $e_x = x - x_d \in \mathbb{R}^3$, (right) attitude error, $e_R = \frac{1}{2} (R_d^T R - R^T R_d)^\vee \in \mathbb{R}^3$.

Multi-Agent Reinforcement Learning for the Low-Level Control of a Quadrotor UAV

TL;DR

Abstract

Multi-Agent Reinforcement Learning for the Low-Level Control of a Quadrotor UAV

Authors

TL;DR

Abstract

Table of Contents

Figures (5)