Table of Contents
Fetching ...

Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding

Mihir Kulkarni, Kostas Alexis

TL;DR

The paper tackles collision-free autonomous flight for aerial robots in cluttered, GPS-denied environments without reliance on global maps. It introduces a modular pipeline where a deep collision encoder compresses depth images to a $64$-dimensional latent representation, which, together with odometry and target information, feeds a DRL policy trained with APPO to enable real-time navigation. Key contributions include task-driven depth compression with supervised training on both simulated and real data, a generalizable RL navigation policy, and extensive sim-to-real validation across simulation and real-world experiments with onboard low-latency inference. This approach demonstrates robust navigation in unseen clutter and under sensor noise, offering a scalable solution for robust autonomous flight in complex environments.

Abstract

This work contributes a novel deep navigation policy that enables collision-free flight of aerial robots based on a modular approach exploiting deep collision encoding and reinforcement learning. The proposed solution builds upon a deep collision encoder that is trained on both simulated and real depth images using supervised learning such that it compresses the high-dimensional depth data to a low-dimensional latent space encoding collision information while accounting for the robot size. This compressed encoding is combined with an estimate of the robot's odometry and the desired target location to train a deep reinforcement learning navigation policy that offers low-latency computation and robust sim2real performance. A set of simulation and experimental studies in diverse environments are conducted and demonstrate the efficiency of the emerged behavior and its resilience in real-life deployments.

Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding

TL;DR

The paper tackles collision-free autonomous flight for aerial robots in cluttered, GPS-denied environments without reliance on global maps. It introduces a modular pipeline where a deep collision encoder compresses depth images to a -dimensional latent representation, which, together with odometry and target information, feeds a DRL policy trained with APPO to enable real-time navigation. Key contributions include task-driven depth compression with supervised training on both simulated and real data, a generalizable RL navigation policy, and extensive sim-to-real validation across simulation and real-world experiments with onboard low-latency inference. This approach demonstrates robust navigation in unseen clutter and under sensor noise, offering a scalable solution for robust autonomous flight in complex environments.

Abstract

This work contributes a novel deep navigation policy that enables collision-free flight of aerial robots based on a modular approach exploiting deep collision encoding and reinforcement learning. The proposed solution builds upon a deep collision encoder that is trained on both simulated and real depth images using supervised learning such that it compresses the high-dimensional depth data to a low-dimensional latent space encoding collision information while accounting for the robot size. This compressed encoding is combined with an estimate of the robot's odometry and the desired target location to train a deep reinforcement learning navigation policy that offers low-latency computation and robust sim2real performance. A set of simulation and experimental studies in diverse environments are conducted and demonstrate the efficiency of the emerged behavior and its resilience in real-life deployments.
Paper Structure (17 sections, 5 equations, 8 figures, 1 table)

This paper contains 17 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Instances of two experiments demonstrating the abilities of the navigation policy trained using deep collision encoding and trained with reinforcement learning. If allowed and space is available, the intuitive behavior of flying above all obstacles is selected (right), while when the robot is constrained regarding its altitude it is capable of maneuvering through highly cluttered settings (left).
  • Figure 2: Overview of the Deep Collision Encoder used to derive a low-dimensional latent space that retains collision information from depth images. The dce is trained using supervised learning that exploits a dataset involving both synthetic and real depth images. The depth images are transformed to collision images that account for the size of the robot. The involved dnn exploits an architecture motivated by variational autoencoders, while the "encoder" and "decoder" elements are in fact also functioning to encode the depth image to the collision image and its reconstruction from the latent space.
  • Figure 3: Overview of the interface between the framework for RL agent training, the Aerial Gym Simulator and the dce.
  • Figure 4: Indicative simulation studies using the Aerial Gym simulator to evaluate the trained policy against increasingly complex environments.
  • Figure 5: Simulation studies using the Flightmare simulator with the goal of evaluating the method given environment diversity --compared to training data-- especially in the case of the forest. On the right the commanded velocities (blue $v_z$, magenta $v_y$ which is zero and $v_x$ and yaw rate $\omega_z$ (green) are shown.
  • ...and 3 more figures