Table of Contents
Fetching ...

Multitask Reinforcement Learning for Quadcopter Attitude Stabilization and Tracking using Graph Policy

Yu Tang Liu, Afonso Vale, Aamir Ahmad, Rodrigo Ventura, Meysam Basiri

TL;DR

This work tackles quadcopter attitude control for both tracking and aggressive stabilization by introducing a Graph-Convolutional-Network policy trained with multitask Soft Actor-Critic in IsaacGym. The approach leverages explicit domain priors through a learnable graph to fuse state, task, and environment information, achieving faster learning and higher sample efficiency, and enabling a compact onboard controller that runs at 400 Hz on a Pixhawk. Sim-to-real transfer is facilitated by RMA-based domain adaptation and a lightweight adaptor, with real-world tests demonstrating robust tracking and stabilization, including recovery from free-fall. While the method delivers strong onboard performance, it faces challenges in fully decoupling task components and may benefit from larger graph architectures or compression techniques for further gains.

Abstract

Quadcopter attitude control involves two tasks: smooth attitude tracking and aggressive stabilization from arbitrary states. Although both can be formulated as tracking problems, their distinct state spaces and control strategies complicate a unified reward function. We propose a multitask deep reinforcement learning framework that leverages parallel simulation with IsaacGym and a Graph Convolutional Network (GCN) policy to address both tasks effectively. Our multitask Soft Actor-Critic (SAC) approach achieves faster, more reliable learning and higher sample efficiency than single-task methods. We validate its real-world applicability by deploying the learned policy - a compact two-layer network with 24 neurons per layer - on a Pixhawk flight controller, achieving 400 Hz control without extra computational resources. We provide our code at https://github.com/robot-perception-group/GraphMTSAC\_UAV/.

Multitask Reinforcement Learning for Quadcopter Attitude Stabilization and Tracking using Graph Policy

TL;DR

This work tackles quadcopter attitude control for both tracking and aggressive stabilization by introducing a Graph-Convolutional-Network policy trained with multitask Soft Actor-Critic in IsaacGym. The approach leverages explicit domain priors through a learnable graph to fuse state, task, and environment information, achieving faster learning and higher sample efficiency, and enabling a compact onboard controller that runs at 400 Hz on a Pixhawk. Sim-to-real transfer is facilitated by RMA-based domain adaptation and a lightweight adaptor, with real-world tests demonstrating robust tracking and stabilization, including recovery from free-fall. While the method delivers strong onboard performance, it faces challenges in fully decoupling task components and may benefit from larger graph architectures or compression techniques for further gains.

Abstract

Quadcopter attitude control involves two tasks: smooth attitude tracking and aggressive stabilization from arbitrary states. Although both can be formulated as tracking problems, their distinct state spaces and control strategies complicate a unified reward function. We propose a multitask deep reinforcement learning framework that leverages parallel simulation with IsaacGym and a Graph Convolutional Network (GCN) policy to address both tasks effectively. Our multitask Soft Actor-Critic (SAC) approach achieves faster, more reliable learning and higher sample efficiency than single-task methods. We validate its real-world applicability by deploying the learned policy - a compact two-layer network with 24 neurons per layer - on a Pixhawk flight controller, achieving 400 Hz control without extra computational resources. We provide our code at https://github.com/robot-perception-group/GraphMTSAC\_UAV/.

Paper Structure

This paper contains 28 sections, 7 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Existing Multitask Policy Architectures. The notation $s_t,w_t,a_t$ stands for state, task, and action at time step $t$ respectively. The $FF$ stand for standard feed-forward network with activation function. The red modules are known as context encoder. There exist variations for the context encoder and designing one is in general non-trivial. For simplicity, we consider only the encoder with linear layers as our baselines.
  • Figure 2: Illustration of our GCN-based policy.The policy observes the robot state $s_t$, the task weight $w_t$, and the environment latent variables $e_t$. These inputs are first passed through feed-forward layers $\text{FFX}$ to produce node embeddings. The GCN then updates embeddings by multiplying with a linear layer $W^{(l)}$ (not shown) and an adjacency matrix $A$ constructed by stacking the graph edges, resulting in action-node embeddings. Each action node has a recurrent edge to itself, incorporating the previous action. Finally, we project action embeddings to scalar values for the control commands. A Kalman filter provides state estimates, and a history buffer feeds the RMA adapter.
  • Figure 3: GCN weight visualization. Each row represents a different action dimension (e.g., roll, pitch, yaw), and each column corresponds to an input dimension from the state or task embeddings. Darker cells indicate stronger learned correlations.
  • Figure 4: Simulation results. Unless otherwise specified, the default network configuration uses 24 neurons per layer. Each experiment was repeated three times with different seeds; the main curve represents the mean performance, and the shaded area indicates the range between the maximum and minimum values.
  • Figure 5: Real World Result. The tracking task applies the task weight $[1,1,1,1,0,0,0]$ whereas the stabilization task has the task weight $[1,0,0,0,0,0,0]$.
  • ...and 1 more figures