MAVRL: Learn to Fly in Cluttered Environments with Varying Speed
Hang Yu, Christophe De Wagter, Guido C. H. E de Croon
TL;DR
MAVRL tackles obstacle avoidance in cluttered drone environments by coupling a memory-augmented latent representation of depth images with a varying-speed reinforcement learning policy. Depth data are encoded by a VAE into $N_e=64$, then processed by an LSTM to produce a memory-rich latent $z_t$ with $N_l=256$, enabling reconstruction of past, present, and some future depth maps. The policy, trained with PPO, uses this latent along with state and target information to output 3D acceleration commands, which are translated into body-rate commands via MPC; the reward structure emphasizes safe, efficient progress and smoother control. AvoidBench-based simulations with progressively increasing obstacle density demonstrate that the memory-augmented, varying-speed approach outperforms fixed-speed baselines and existing methods, with real-world deployment requiring only minimal fine-tuning of the perception modules. This work advances practical autonomous flight by integrating memory, adaptive speed, and a physics-informed control loop for robust obstacle avoidance in real-world clutter.
Abstract
Many existing obstacle avoidance algorithms overlook the crucial balance between safety and agility, especially in environments of varying complexity. In our study, we introduce an obstacle avoidance pipeline based on reinforcement learning. This pipeline enables drones to adapt their flying speed according to the environmental complexity. Moreover, to improve the obstacle avoidance performance in cluttered environments, we propose a novel latent space. The latent space in this representation is explicitly trained to retain memory of previous depth map observations. Our findings confirm that varying speed leads to a superior balance of success rate and agility in cluttered environments. Additionally, our memory-augmented latent representation outperforms the latent representation commonly used in reinforcement learning. Finally, after minimal fine-tuning, we successfully deployed our network on a real drone for enhanced obstacle avoidance.
