MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

Jin Zhou; Dongcheng Cao; Xian Wang; Shuo Li

MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

Jin Zhou, Dongcheng Cao, Xian Wang, Shuo Li

TL;DR

This work introduces MAVEN, a meta-RL framework that enables a single policy to achieve robust end-to-end navigation across a wide range of quadrotor dynamics, and features a novel predictive context encoder, which learns to infer a latent representation of the system dynamics from interaction history.

Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for achieving online agile navigation with quadrotors. Despite this success, policies trained via standard RL typically fail to generalize across significant dynamic variations, exhibiting a critical lack of adaptability. This work introduces MAVEN, a meta-RL framework that enables a single policy to achieve robust end-to-end navigation across a wide range of quadrotor dynamics. Our approach features a novel predictive context encoder, which learns to infer a latent representation of the system dynamics from interaction history. We demonstrate our method in agile waypoint traversal tasks under two challenging scenarios: large variations in quadrotor mass and severe single-rotor thrust loss. We leverage a GPU-vectorized simulator to distribute tasks across thousands of parallel environments, overcoming the long training times of meta-RL to converge in less than an hour. Through extensive experiments in both simulation and the real world, we validate that MAVEN achieves superior adaptation and agility. The policy successfully executes zero-shot sim-to-real transfer, demonstrating robust online adaptation by performing high-speed maneuvers despite mass variations of up to 66.7% and single-rotor thrust losses as severe as 70%.

MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

TL;DR

Abstract

Paper Structure (19 sections, 13 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 19 sections, 13 equations, 10 figures, 3 tables, 2 algorithms.

Introduction
Methodology
Quadrotor Dynamics
Problem Statement
Meta-RL Framework
Task inference
Policy optimization
Predictive Context Encoder
Policy Training and Deployment
Simulation Results and Analysis
Simulation setups and baselines
Mass variation
Thrust loss
Experiment Setup and Result
Experiment Setup
...and 4 more sections

Figures (10)

Figure 1: Demonstration of our policy performing three consecutive flights in the mass variation scenario without landing. (a) We alter the quadrotor mass with a magnet payload. (b) Illustration of a quadrotor with varying masses.
Figure 2: Overview of our meta-RL framework for online adaptation to quadrotor dynamics. The policy is trained in parallelized simulation environments on tasks with varying dynamics (mass and thrust loss). A novel predictive context encoder learns to infer a latent variable that conditions the policy network, enabling task-aware adaptation. For deployment, the trained policy performs real-time task inference and navigation on an onboard computer.
Figure 3: Comparison of trajectories on a switchback track for quadrotors with varying mass (260g, 330g, 440g, and 550g). Shaded areas denote waypoints with a radius of 1.0m. Spots and stars denote the starting and ending waypoints, respectively. Although both RL-DR and our method use a single policy, our approach yields trajectories similar to the mass-specific RL, whereas RL-DR exhibits unnecessary detours at certain masses.
Figure 4: Flight trajectories and completion times for our policy and two baselines (mass-specific RL and RL-DR) on two challenging tracks. Our method's performance closely resembles the mass-specific expert's, avoiding the inefficient detours or braking of the RL-DR.
Figure 5: Trajectories of our method against varying degrees of thrust loss, with the result of a standard RL method as a baseline.
...and 5 more figures

MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

TL;DR

Abstract

MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

Authors

TL;DR

Abstract

Table of Contents

Figures (10)