High-Speed Vision-Based Flight in Clutter with Safety-Shielded Reinforcement Learning

Jiarui Zhang; Chengyong Lei; Chengjiang Dai; Lijie Wang; Zhichao Han; Fei Gao

High-Speed Vision-Based Flight in Clutter with Safety-Shielded Reinforcement Learning

Jiarui Zhang, Chengyong Lei, Chengjiang Dai, Lijie Wang, Zhichao Han, Fei Gao

TL;DR

This work addresses the challenge of fast, safe navigation for quadrotors in cluttered environments by marrying end to end reinforcement learning with model based safety. It introduces a geodesic based reward shaping during training and a real time HOCBF based safety filter during deployment, enabling high speed flight with formal collision avoidance. Key contributions include a Dijkstra derived navigation potential, ESDF based safety shaping, a HOCBF corrected action layer, and robust sim to real transfer validated across simulation and real world flights up to $7.5$ m/s. The approach outperforms traditional planners and learning baselines while maintaining strong sim to real transfer, highlighting a practical path to agile yet safe autonomous flight in real environments.

Abstract

Quadrotor unmanned aerial vehicles (UAVs) are increasingly deployed in complex missions that demand reliable autonomous navigation and robust obstacle avoidance. However, traditional modular pipelines often incur cumulative latency, whereas purely reinforcement learning (RL) approaches typically provide limited formal safety guarantees. To bridge this gap, we propose an end-to-end RL framework augmented with model-based safety mechanisms. We incorporate physical priors in both training and deployment. During training, we design a physics-informed reward structure that provides global navigational guidance. During deployment, we integrate a real-time safety filter that projects the policy outputs onto a provably safe set to enforce strict collision-avoidance constraints. This hybrid architecture reconciles high-speed flight with robust safety assurances. Benchmark evaluations demonstrate that our method outperforms both traditional planners and recent end-to-end obstacle avoidance approaches based on differentiable physics. Extensive experiments demonstrate strong generalization, enabling reliable high-speed navigation in dense clutter and challenging outdoor forest environments at velocities up to 7.5m/s.

High-Speed Vision-Based Flight in Clutter with Safety-Shielded Reinforcement Learning

TL;DR

m/s. The approach outperforms traditional planners and learning baselines while maintaining strong sim to real transfer, highlighting a practical path to agile yet safe autonomous flight in real environments.

Abstract

Paper Structure (20 sections, 8 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 8 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Modular Navigation for UAV
Learning-Based Methods
Methodology
Problem Formulation
Observation Space
Action Space
Reward Function Design
Network Architecture
Model-based Reward Shaping
HOCBF-based Correction Loop
Sim-to-Real Transfer
Experiments
Simulations
...and 5 more sections

Figures (7)

Figure 1: Snapshot of real-world indoor flight experiments. Flight trajectories recorded via long-exposure photography demonstrate the UAV's agile maneuvering capabilities during high-speed traversal between trees and artificial obstacles.
Figure 2: Network architecture and control pipeline. An asymmetric actor-critic policy fuses a depth image (CNN) and proprioceptive states via a GRU. The actor outputs attitude references and normalized thrust, which are tracked by a PX4 controller. During training, domain randomization (dropout/noise/delay) improves robustness; during deployment, a real-time HOCBF filter refines the raw command to enforce safety constraints.
Figure 3: Visualization of depth observations. (a) Raw depth image captured by an Intel RealSense D435i camera. (b) Processed image after applying the Navier-Stokes inpainting algorithm. (c) Synthetic depth image generated from the simulated sensor.
Figure 4: Training performance and environment. (a) Comparative training curves displaying success rates over iterations for different reward configurations. (b) Visualization of a representative training scenario populated with dense obstacles.
Figure 5: Benchmark comparisons against state-of-the-art methods.(a) Average realized velocity of Ego-Planner zhou2020ego, DiffPhys zhang2025learning, and our method across varying target speeds and environmental configurations. (b) Visualization of the resulting flight trajectories generated by each method in geometric clutter (left) and Perlin noise (right) scenarios.
...and 2 more figures

High-Speed Vision-Based Flight in Clutter with Safety-Shielded Reinforcement Learning

TL;DR

Abstract

High-Speed Vision-Based Flight in Clutter with Safety-Shielded Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)