Table of Contents
Fetching ...

Vision-Based End-to-End Learning for UAV Traversal of Irregular Gaps via Differentiable Simulation

Linzuo Zhang, Yu Hu, Feng Yu, Yang Deng, Wenxian Yu, Danping Zou

Abstract

-Navigation through narrow and irregular gaps is an essential skill in autonomous drones for applications such as inspection, search-and-rescue, and disaster response. However, traditional planning and control methods rely on explicit gap extraction and measurement, while recent end-to-end approaches often assume regularly shaped gaps, leading to poor generalization and limited practicality. In this work, we present a fully vision-based, end-to-end framework that maps depth images directly to control commands, enabling drones to traverse complex gaps within unseen environments. Operating in the Special Euclidean group SE(3), where position and orientation are tightly coupled, the framework leverages differentiable simulation, a Stop-Gradient operator, and a Bimodal Initialization Distribution to achieve stable traversal through consecutive gaps. Two auxiliary prediction modules-a gap-crossing success classifier and a traversability predictor-further enhance continuous navigation and safety. Extensive simulation and real-world experiments demonstrate the approach's effectiveness, generalization capability, and practical robustness.

Vision-Based End-to-End Learning for UAV Traversal of Irregular Gaps via Differentiable Simulation

Abstract

-Navigation through narrow and irregular gaps is an essential skill in autonomous drones for applications such as inspection, search-and-rescue, and disaster response. However, traditional planning and control methods rely on explicit gap extraction and measurement, while recent end-to-end approaches often assume regularly shaped gaps, leading to poor generalization and limited practicality. In this work, we present a fully vision-based, end-to-end framework that maps depth images directly to control commands, enabling drones to traverse complex gaps within unseen environments. Operating in the Special Euclidean group SE(3), where position and orientation are tightly coupled, the framework leverages differentiable simulation, a Stop-Gradient operator, and a Bimodal Initialization Distribution to achieve stable traversal through consecutive gaps. Two auxiliary prediction modules-a gap-crossing success classifier and a traversability predictor-further enhance continuous navigation and safety. Extensive simulation and real-world experiments demonstrate the approach's effectiveness, generalization capability, and practical robustness.

Paper Structure

This paper contains 23 sections, 15 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Visualization of the training and real-world evaluation: The top row shows depth image sequences collected during training; the middle row shows depth images captured by the real drone during execution; the bottom row illustrates the real-world trajectory, where the drone first flies through a regular gate similar to the training scenarios and then successfully navigates an irregular, previously unseen gap, demonstrating the policy’s generalization ability.
  • Figure 2: System Overview. An end-to-end policy maps depth images directly to control commands and is trained via differentiable simulation, enabling direct back-propagation of task losses to the network. A gap-crossing detection module resets the policy hidden state to support continuous multi-gap traversal, while a traversability prediction module improves safety in challenging environments. A bimodal initialization distribution stabilizes training and enhances robustness across successive gaps.
  • Figure 3: Mesh-based depth renderer. A high-speed CUDA-based renderer generates depth images from mesh geometries. Domain randomization is applied by perturbing the angles and corner vertices, producing diverse gap configurations for training.
  • Figure 4: AirSim Simulation environments. (a) Single-gap scenario, where the quadrotor flies through a single tilted gap. (b) Multi-gap scenario, where the quadrotor sequentially flies through multiple tilted gaps. (c) Wall-mounted gap scenario, where a square opening is embedded in a planar wall.
  • Figure 5: Baseline comparison in AirSim. We compare our end-to-end vision-based policy with two baselines using two depth inputs: ground-truth (GT) and Semi-Global Matching (SGM). Baselines are: (1) a PPO-based policy with edge-drawing front-end, and (2) a state-of-the-art vision-based navigation method Zhang2024BackTN.
  • ...and 4 more figures