Table of Contents
Fetching ...

Agile in the Face of Delay: Asynchronous End-to-End Learning for Real-World Aerial Navigation

Yude Li, Zhexuan Zhou, Huizhe Li, Youmin Gong, Jie Mei

TL;DR

An asynchronous reinforcement learning framework is proposed that decouples perception and control, enabling a high-frequency policy to act on the latest IMU state for immediate reactivity, while incorporating perception features asynchronously.

Abstract

Robust autonomous navigation for Autonomous Aerial Vehicles (AAVs) in complex environments is a critical capability. However, modern end-to-end navigation faces a key challenge: the high-frequency control loop needed for agile flight conflicts with low-frequency perception streams, which are limited by sensor update rates and significant computational cost. This mismatch forces conventional synchronous models into undesirably low control rates. To resolve this, we propose an asynchronous reinforcement learning framework that decouples perception and control, enabling a high-frequency policy to act on the latest IMU state for immediate reactivity, while incorporating perception features asynchronously. To manage the resulting data staleness, we introduce a theoretically-grounded Temporal Encoding Module (TEM) that explicitly conditions the policy on perception delays, a strategy complemented by a two-stage curriculum to ensure stable and efficient training. Validated in extensive simulations, our method was successfully deployed in zero-shot sim-to-real transfer on an onboard NUC, where it sustains a 100~Hz control rate and demonstrates robust, agile navigation in cluttered real-world environments. Our source code will be released for community reference.

Agile in the Face of Delay: Asynchronous End-to-End Learning for Real-World Aerial Navigation

TL;DR

An asynchronous reinforcement learning framework is proposed that decouples perception and control, enabling a high-frequency policy to act on the latest IMU state for immediate reactivity, while incorporating perception features asynchronously.

Abstract

Robust autonomous navigation for Autonomous Aerial Vehicles (AAVs) in complex environments is a critical capability. However, modern end-to-end navigation faces a key challenge: the high-frequency control loop needed for agile flight conflicts with low-frequency perception streams, which are limited by sensor update rates and significant computational cost. This mismatch forces conventional synchronous models into undesirably low control rates. To resolve this, we propose an asynchronous reinforcement learning framework that decouples perception and control, enabling a high-frequency policy to act on the latest IMU state for immediate reactivity, while incorporating perception features asynchronously. To manage the resulting data staleness, we introduce a theoretically-grounded Temporal Encoding Module (TEM) that explicitly conditions the policy on perception delays, a strategy complemented by a two-stage curriculum to ensure stable and efficient training. Validated in extensive simulations, our method was successfully deployed in zero-shot sim-to-real transfer on an onboard NUC, where it sustains a 100~Hz control rate and demonstrates robust, agile navigation in cluttered real-world environments. Our source code will be released for community reference.

Paper Structure

This paper contains 20 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The proposed asynchronous framework validated through zero-shot sim-to-real flight in a dense forest. (a) The custom-built AAV navigating through cluttered trees. (b) The complete, collision-free trajectory of a 30-meter flight test through the forest.
  • Figure 2: Overview of the asynchronous framework. The low-frequency perception module converts LiDAR point clouds to a pseudo-image, which a CNN processes into a feature vector. The high-frequency control module concatenates this feature with the latest IMU state, previous action, desired speed, and a temporal encoding vector representing data staleness. An MLP policy then processes this combined state to generate high-frequency control commands.
  • Figure 3: Visualization of (a) a sample training environment and (b) the collision-free trajectory generated by the proposed end-to-end navigation model.
  • Figure 4: Evaluation of model success rate under challenging conditions.
  • Figure 5: Real-world flight validation of our asynchronous framework, demonstrating successful zero-shot sim-to-real transfer in cluttered environments. (a) The AAV navigating a dense indoor obstacle field ($0.25~{m}^{-2}$). (b) The AAV's flight trajectory in a scenario involving a dynamic obstacle. (c) First-person and (d) third-person views of autonomous navigation through a dense forest ($0.18~{m}^{-2}$). (e) Onboard visualization displaying the input LiDAR point cloud alongside the corresponding velocity command (blue arrow) generated by the policy.