Table of Contents
Fetching ...

MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

Jin Zhou, Dongcheng Cao, Xian Wang, Shuo Li

TL;DR

This work introduces MAVEN, a meta-RL framework that enables a single policy to achieve robust end-to-end navigation across a wide range of quadrotor dynamics, and features a novel predictive context encoder, which learns to infer a latent representation of the system dynamics from interaction history.

Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for achieving online agile navigation with quadrotors. Despite this success, policies trained via standard RL typically fail to generalize across significant dynamic variations, exhibiting a critical lack of adaptability. This work introduces MAVEN, a meta-RL framework that enables a single policy to achieve robust end-to-end navigation across a wide range of quadrotor dynamics. Our approach features a novel predictive context encoder, which learns to infer a latent representation of the system dynamics from interaction history. We demonstrate our method in agile waypoint traversal tasks under two challenging scenarios: large variations in quadrotor mass and severe single-rotor thrust loss. We leverage a GPU-vectorized simulator to distribute tasks across thousands of parallel environments, overcoming the long training times of meta-RL to converge in less than an hour. Through extensive experiments in both simulation and the real world, we validate that MAVEN achieves superior adaptation and agility. The policy successfully executes zero-shot sim-to-real transfer, demonstrating robust online adaptation by performing high-speed maneuvers despite mass variations of up to 66.7% and single-rotor thrust losses as severe as 70%.

MAVEN: A Meta-Reinforcement Learning Framework for Varying-Dynamics Expertise in Agile Quadrotor Maneuvers

TL;DR

This work introduces MAVEN, a meta-RL framework that enables a single policy to achieve robust end-to-end navigation across a wide range of quadrotor dynamics, and features a novel predictive context encoder, which learns to infer a latent representation of the system dynamics from interaction history.

Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for achieving online agile navigation with quadrotors. Despite this success, policies trained via standard RL typically fail to generalize across significant dynamic variations, exhibiting a critical lack of adaptability. This work introduces MAVEN, a meta-RL framework that enables a single policy to achieve robust end-to-end navigation across a wide range of quadrotor dynamics. Our approach features a novel predictive context encoder, which learns to infer a latent representation of the system dynamics from interaction history. We demonstrate our method in agile waypoint traversal tasks under two challenging scenarios: large variations in quadrotor mass and severe single-rotor thrust loss. We leverage a GPU-vectorized simulator to distribute tasks across thousands of parallel environments, overcoming the long training times of meta-RL to converge in less than an hour. Through extensive experiments in both simulation and the real world, we validate that MAVEN achieves superior adaptation and agility. The policy successfully executes zero-shot sim-to-real transfer, demonstrating robust online adaptation by performing high-speed maneuvers despite mass variations of up to 66.7% and single-rotor thrust losses as severe as 70%.
Paper Structure (19 sections, 13 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 19 sections, 13 equations, 10 figures, 3 tables, 2 algorithms.

Figures (10)

  • Figure 1: Demonstration of our policy performing three consecutive flights in the mass variation scenario without landing. (a) We alter the quadrotor mass with a magnet payload. (b) Illustration of a quadrotor with varying masses.
  • Figure 2: Overview of our meta-RL framework for online adaptation to quadrotor dynamics. The policy is trained in parallelized simulation environments on tasks with varying dynamics (mass and thrust loss). A novel predictive context encoder learns to infer a latent variable that conditions the policy network, enabling task-aware adaptation. For deployment, the trained policy performs real-time task inference and navigation on an onboard computer.
  • Figure 3: Comparison of trajectories on a switchback track for quadrotors with varying mass (260g, 330g, 440g, and 550g). Shaded areas denote waypoints with a radius of 1.0m. Spots and stars denote the starting and ending waypoints, respectively. Although both RL-DR and our method use a single policy, our approach yields trajectories similar to the mass-specific RL, whereas RL-DR exhibits unnecessary detours at certain masses.
  • Figure 4: Flight trajectories and completion times for our policy and two baselines (mass-specific RL and RL-DR) on two challenging tracks. Our method's performance closely resembles the mass-specific expert's, avoiding the inefficient detours or braking of the RL-DR.
  • Figure 5: Trajectories of our method against varying degrees of thrust loss, with the result of a standard RL method as a baseline.
  • ...and 5 more figures