Learning Speed Adaptation for Flight in Clutter

Guangyu Zhao; Tianyue Wu; Yeke Chen; Fei Gao

Learning Speed Adaptation for Flight in Clutter

Guangyu Zhao, Tianyue Wu, Yeke Chen, Fei Gao

TL;DR

The paper tackles safe, efficient flight in unknown clutter by introducing a hierarchical framework where an outer-loop RL policy learns a speed constraint $v^{\dagger}$ to guide a robust model-based planner. This two-tier approach yields perception-aware, aggressive yet safe trajectories, outperforming constant-speed baselines and an EVA-planner in both simulation and real-world tests. The key novelty lies in the two-stage reward design that stabilizes learning amid stochastic termination and perception latency, coupled with a local occupancy-map representation that enables perception-aware behaviors. The results indicate significant practical impact for fast, reliable autonomous flight in cluttered environments and highlight avenues for giving the learned policy more control while preserving the safety guarantees of traditional planners.

Abstract

Animals learn to adapt speed of their movements to their capabilities and the environment they observe. Mobile robots should also demonstrate this ability to trade-off aggressiveness and safety for efficiently accomplishing tasks. The aim of this work is to endow flight vehicles with the ability of speed adaptation in prior unknown and partially observable cluttered environments. We propose a hierarchical learning and planning framework where we utilize both well-established methods of model-based trajectory generation and trial-and-error that comprehensively learns a policy to dynamically configure the speed constraint. Technically, we use online reinforcement learning to obtain the deployable policy. The statistical results in simulation demonstrate the advantages of our method over the constant speed constraint baselines and an alternative method in terms of flight efficiency and safety. In particular, the policy behaves perception awareness, which distinguish it from alternative approaches. By deploying the policy to hardware, we verify that these advantages can be brought to the real world.

Learning Speed Adaptation for Flight in Clutter

TL;DR

The paper tackles safe, efficient flight in unknown clutter by introducing a hierarchical framework where an outer-loop RL policy learns a speed constraint

to guide a robust model-based planner. This two-tier approach yields perception-aware, aggressive yet safe trajectories, outperforming constant-speed baselines and an EVA-planner in both simulation and real-world tests. The key novelty lies in the two-stage reward design that stabilizes learning amid stochastic termination and perception latency, coupled with a local occupancy-map representation that enables perception-aware behaviors. The results indicate significant practical impact for fast, reliable autonomous flight in cluttered environments and highlight avenues for giving the learned policy more control while preserving the safety guarantees of traditional planners.

Abstract

Paper Structure (29 sections, 11 equations, 9 figures)

This paper contains 29 sections, 11 equations, 9 figures.

Introduction
Related Work
Adaptive Motion Planning
Combining Learning and Model-based Planner for Navigation in Cluttered Environments
Problem Formulation
Framework
Model-based Trajectory Planner
Hierarchical Policy Optimization
System Overview
Reinforcement Learning for the Outer-loop Policy
Policy Representation
Observation Space
Early Termination
Reward Function
A human knowledge-based, dense reward function for pre-training
...and 14 more sections

Figures (9)

Figure 1: Overview of the system with the hierarchical policy.
Figure 2: Illustration of the policy architecture and observation implementation. The figure shows the network architecture of policy, where actor and critic share the CNN encoder and fusion layer. We also highlight the 3D occupancy map where each cell is assigned a state, and its x-y profiles are sampled as the input of the network.
Figure 3: Example illustration of environments for training. Areas in blue represent the space filled with obstacles.
Figure 4: Velocity distribution along the trajectory in different setups. The dark green curves are the reference trajectory. The colorful curves are the trajectory that the vehicle passes over, where the color represents the velocity. (a) An example result of the proposed approach. (b) An example result of the constant speed constraint ${v}^{\dagger}=2$m/s. (c) An example result of the intermediate model before fine-tuning.
Figure 5: Statistical results of different setups. Success rates are computed on 50 trials for each setup. The light blue dashed lines connect the statistics at different levels of constant speed constraint, indicating the inherent capability of the system. Average velocities are computed as the average of success trials in the 50 trials.
...and 4 more figures

Learning Speed Adaptation for Flight in Clutter

TL;DR

Abstract

Learning Speed Adaptation for Flight in Clutter

Authors

TL;DR

Abstract

Table of Contents

Figures (9)