MAVRL: Learn to Fly in Cluttered Environments with Varying Speed

Hang Yu; Christophe De Wagter; Guido C. H. E de Croon

MAVRL: Learn to Fly in Cluttered Environments with Varying Speed

Hang Yu, Christophe De Wagter, Guido C. H. E de Croon

TL;DR

MAVRL tackles obstacle avoidance in cluttered drone environments by coupling a memory-augmented latent representation of depth images with a varying-speed reinforcement learning policy. Depth data are encoded by a VAE into $N_e=64$, then processed by an LSTM to produce a memory-rich latent $z_t$ with $N_l=256$, enabling reconstruction of past, present, and some future depth maps. The policy, trained with PPO, uses this latent along with state and target information to output 3D acceleration commands, which are translated into body-rate commands via MPC; the reward structure emphasizes safe, efficient progress and smoother control. AvoidBench-based simulations with progressively increasing obstacle density demonstrate that the memory-augmented, varying-speed approach outperforms fixed-speed baselines and existing methods, with real-world deployment requiring only minimal fine-tuning of the perception modules. This work advances practical autonomous flight by integrating memory, adaptive speed, and a physics-informed control loop for robust obstacle avoidance in real-world clutter.

Abstract

Many existing obstacle avoidance algorithms overlook the crucial balance between safety and agility, especially in environments of varying complexity. In our study, we introduce an obstacle avoidance pipeline based on reinforcement learning. This pipeline enables drones to adapt their flying speed according to the environmental complexity. Moreover, to improve the obstacle avoidance performance in cluttered environments, we propose a novel latent space. The latent space in this representation is explicitly trained to retain memory of previous depth map observations. Our findings confirm that varying speed leads to a superior balance of success rate and agility in cluttered environments. Additionally, our memory-augmented latent representation outperforms the latent representation commonly used in reinforcement learning. Finally, after minimal fine-tuning, we successfully deployed our network on a real drone for enhanced obstacle avoidance.

MAVRL: Learn to Fly in Cluttered Environments with Varying Speed

TL;DR

, then processed by an LSTM to produce a memory-rich latent

with

, enabling reconstruction of past, present, and some future depth maps. The policy, trained with PPO, uses this latent along with state and target information to output 3D acceleration commands, which are translated into body-rate commands via MPC; the reward structure emphasizes safe, efficient progress and smoother control. AvoidBench-based simulations with progressively increasing obstacle density demonstrate that the memory-augmented, varying-speed approach outperforms fixed-speed baselines and existing methods, with real-world deployment requiring only minimal fine-tuning of the perception modules. This work advances practical autonomous flight by integrating memory, adaptive speed, and a physics-informed control loop for robust obstacle avoidance in real-world clutter.

Abstract

Paper Structure (16 sections, 6 equations, 7 figures, 2 tables)

This paper contains 16 sections, 6 equations, 7 figures, 2 tables.

INTRODUCTION
RELATED WORK
Learning-based Obstacle Avoidance
Latent Representations
Learning a Memory-augmented Representation
Encoding Depth Images
Memory-augmented Latent Representation
Reinforcement Learning for Obstacle Avoidance
Problem Formulation
Reward Functions
Training in Varying Complexity Environments
Experiments
Latent Representation
Benchmarking for Varying Speed Policy
Real World Tests
...and 1 more sections

Figures (7)

Figure 1: (a) is the basic framework of MAVRL. (b) illustrates drone's trajectories in a Cluttered Environment. Fixed-speed flight often results in collisions with large obstacles. Absence of augmented memory leads to frequent entrapment in such obstacles. In contrast, MAVRL-equipped flights demonstrate safe and efficient navigation through complex terrains.
Figure 2: (a) depicts MAVRL's network architecture. The depth image, encoded into a latent space by VAE, is processed by LSTM to create a memory-augmented representation. This, combined with the drone's state and target data, informs the acceleration command via PPO. The part of the network within the red dotted box, used for LSTM training, is not for drone deployment. (b) compares original and reconstructed depth images from latent space $\mathbf{z}_t$, showing better quality for past and current images than for future ones, due to the difficulty in predicting future states (highest MAE loss).
Figure 3: (a) illustrates the drone's coordinate system, with the bearing angle $\beta$ between body axis $x_b$ and the target vector, and the track angle $\chi$ as the horizontal velocity's direction in the world frame. (b) presents an adaptive drone trajectory in a cluttered environment. The drone decelerates when navigating complex obstacles and accelerates in simpler scenarios, demonstrating dynamic speed adjustment based on obstacle density. (c) displays the drone's average speed in response to uniformly bright gray images (green diamond), and the mean and standard deviation of the speed when the input are actual depth images from (b) (red round).
Figure 4: Success rates of $I_t$, $I_{t+10}$, and the superior combinations $I_t \& I_{t-20}$ and $I_t \& I_{t+10}$. The shadow area represents the standard deviation.
Figure 5: (a) is the success rate of 2 different MAVRL versions and Agile-Autonomy. (b) is the average goal velocity (AGV) of 2 different MAVRL versions and Agile-Autonomy. (c) is the Pareto frontier of success rate versus average flight speed.
...and 2 more figures

MAVRL: Learn to Fly in Cluttered Environments with Varying Speed

TL;DR

Abstract

MAVRL: Learn to Fly in Cluttered Environments with Varying Speed

Authors

TL;DR

Abstract

Table of Contents

Figures (7)