Table of Contents
Fetching ...

Multi-Task Reinforcement Learning for Quadrotors

Jiaxu Xing, Ismail Geles, Yunlong Song, Elie Aljalbout, Davide Scaramuzza

TL;DR

The paper tackles the challenge of creating a generalist quadrotor controller capable of multiple tasks without retraining. It introduces a multi-task reinforcement learning framework that shares information through a dynamics-aware encoder and uses a multi-critic setup to handle task-specific rewards. The approach enables a single policy to perform high-speed stabilization, autonomous racing, and velocity tracking, validated in both simulation (Flightmare) and real-world flights. Results show improved sample efficiency and robust cross-task performance compared with single-task baselines, marking a step toward versatile, real-world quadrotor systems.

Abstract

Reinforcement learning (RL) has shown great effectiveness in quadrotor control, enabling specialized policies to develop even human-champion-level performance in single-task scenarios. However, these specialized policies often struggle with novel tasks, requiring a complete retraining of the policy from scratch. To address this limitation, this paper presents a novel multi-task reinforcement learning (MTRL) framework tailored for quadrotor control, leveraging the shared physical dynamics of the platform to enhance sample efficiency and task performance. By employing a multi-critic architecture and shared task encoders, our framework facilitates knowledge transfer across tasks, enabling a single policy to execute diverse maneuvers, including high-speed stabilization, velocity tracking, and autonomous racing. Our experimental results, validated both in simulation and real-world scenarios, demonstrate that our framework outperforms baseline approaches in terms of sample efficiency and overall task performance.

Multi-Task Reinforcement Learning for Quadrotors

TL;DR

The paper tackles the challenge of creating a generalist quadrotor controller capable of multiple tasks without retraining. It introduces a multi-task reinforcement learning framework that shares information through a dynamics-aware encoder and uses a multi-critic setup to handle task-specific rewards. The approach enables a single policy to perform high-speed stabilization, autonomous racing, and velocity tracking, validated in both simulation (Flightmare) and real-world flights. Results show improved sample efficiency and robust cross-task performance compared with single-task baselines, marking a step toward versatile, real-world quadrotor systems.

Abstract

Reinforcement learning (RL) has shown great effectiveness in quadrotor control, enabling specialized policies to develop even human-champion-level performance in single-task scenarios. However, these specialized policies often struggle with novel tasks, requiring a complete retraining of the policy from scratch. To address this limitation, this paper presents a novel multi-task reinforcement learning (MTRL) framework tailored for quadrotor control, leveraging the shared physical dynamics of the platform to enhance sample efficiency and task performance. By employing a multi-critic architecture and shared task encoders, our framework facilitates knowledge transfer across tasks, enabling a single policy to execute diverse maneuvers, including high-speed stabilization, velocity tracking, and autonomous racing. Our experimental results, validated both in simulation and real-world scenarios, demonstrate that our framework outperforms baseline approaches in terms of sample efficiency and overall task performance.

Paper Structure

This paper contains 27 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The proposed approach performs three distinct tasks for quadrotor control in the real world. The resulting single MTRL policy can (Top) stabilize the quadrotor from high speed, (Middle) autonomously race through a fixed track, and (Bottom) track randomly generated velocities.
  • Figure 2: Diagram of quadrotor model with the world and body frames and propeller numbering convention.
  • Figure 3: Our MTRL framework utilizes a shared encoder for observations related to the quadrotor dynamics across all tasks. The embedding output from the shared encoder is then merged with the task-specific observation (e.g., the gate observation from the racing task and the desired velocity from the tracking task) to create a task-specific embedding. The policy uses both the concatenated embedding (64) from the shared embedding (32) and the task-specific embedding (32) to generate control commands. A separate critic function is used for each task, which is not employed during deployment.
  • Figure 4: Overview of the average return comparison of different tasks. It is clearly shown that our proposed MTRL approach achieves a higher average return within the same number of training steps compared to single-task RL baselines. Notably, single-task RL policies still perform comparably to the MTRL approach when only the actor network is shared.
  • Figure 5: Illustration of one racing policy rollout. The policy successfully completes a Figure-8 race track, which consists of six gates, with a 100% success rate.
  • ...and 1 more figures