Table of Contents
Fetching ...

Time-optimal Flight in Cluttered Environments via Safe Reinforcement Learning

Wei Xiao, Zhaohan Feng, Ziyu Zhou, Jian Sun, Gang Wang, Jie Chen

TL;DR

The paper tackles time-optimal quadrotor flight through a sequence of waypoints in cluttered environments. It introduces a safe reinforcement learning framework that augments the objective with a gate-progress term, a safety term based on obstacle proximity, and a terminal time penalty $\lambda_4 T$ to encourage fast, collision-free trajectories under a full 6-DoF quadrotor model. Training uses PPO in Flightmare with parallel environments and domain randomization to achieve robust generalization, achieving a $66.7\%$ success rate in unseen, dense obstacle configurations and competitive flight times compared to state-of-the-art methods. Ablation studies confirm the positive impact of the safety and terminal rewards on obstacle avoidance and time efficiency, highlighting practical potential for high-speed autonomous drone racing. The approach offers a scalable, sim-to-real-transferable solution for fast, safe flight in cluttered spaces.

Abstract

This paper addresses the problem of guiding a quadrotor through a predefined sequence of waypoints in cluttered environments, aiming to minimize the flight time while avoiding collisions. Previous approaches either suffer from prolonged computational time caused by solving complex non-convex optimization problems or are limited by the inherent smoothness of polynomial trajectory representations, thereby restricting the flexibility of movement. In this work, we present a safe reinforcement learning approach for autonomous drone racing with time-optimal flight in cluttered environments. The reinforcement learning policy, trained using safety and terminal rewards specifically designed to enforce near time-optimal and collision-free flight, outperforms current state-of-the-art algorithms. Additionally, experimental results demonstrate the efficacy of the proposed approach in achieving both minimum flight time and obstacle avoidance objectives in complex environments, with a commendable $66.7\%$ success rate in unseen, challenging settings.

Time-optimal Flight in Cluttered Environments via Safe Reinforcement Learning

TL;DR

The paper tackles time-optimal quadrotor flight through a sequence of waypoints in cluttered environments. It introduces a safe reinforcement learning framework that augments the objective with a gate-progress term, a safety term based on obstacle proximity, and a terminal time penalty to encourage fast, collision-free trajectories under a full 6-DoF quadrotor model. Training uses PPO in Flightmare with parallel environments and domain randomization to achieve robust generalization, achieving a success rate in unseen, dense obstacle configurations and competitive flight times compared to state-of-the-art methods. Ablation studies confirm the positive impact of the safety and terminal rewards on obstacle avoidance and time efficiency, highlighting practical potential for high-speed autonomous drone racing. The approach offers a scalable, sim-to-real-transferable solution for fast, safe flight in cluttered spaces.

Abstract

This paper addresses the problem of guiding a quadrotor through a predefined sequence of waypoints in cluttered environments, aiming to minimize the flight time while avoiding collisions. Previous approaches either suffer from prolonged computational time caused by solving complex non-convex optimization problems or are limited by the inherent smoothness of polynomial trajectory representations, thereby restricting the flexibility of movement. In this work, we present a safe reinforcement learning approach for autonomous drone racing with time-optimal flight in cluttered environments. The reinforcement learning policy, trained using safety and terminal rewards specifically designed to enforce near time-optimal and collision-free flight, outperforms current state-of-the-art algorithms. Additionally, experimental results demonstrate the efficacy of the proposed approach in achieving both minimum flight time and obstacle avoidance objectives in complex environments, with a commendable success rate in unseen, challenging settings.
Paper Structure (19 sections, 6 equations, 3 figures, 5 tables)

This paper contains 19 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The proposed SRL framework
  • Figure 2: The trajectories generated by our method in Split-S with obstacles are depicted in the figure. The light gray area represents obstacles. During the flight, the quadrotor needs to pass through $7$ waypoints, with one of them being randomly initialized in position for each flight.
  • Figure 3: The trajectories generated by our approach in a level $1$ forest environment navigate through nine waypoints while effectively avoiding obstacles. The light gray area in the figure represents the obstacles.