Table of Contents
Fetching ...

Self-organized free-flight arrival for urban air mobility

Martin Waltz, Ostap Okhrin, Michael Schultz

TL;DR

This work addresses safe and scalable terminal arrival management for urban air mobility in a circular vertiport airspace using a decentralized, self-organized approach. It introduces a shared LSTM-TD3-based policy within a CTDE MARL framework, trained via curriculum learning to handle progressively larger traffic while remaining decentralized. The observation space combines self-vehicle and surrounding-vehicle information, and the reward integrates collision avoidance, VTOL-zone entrance, airspace containment, and motion comfort. The approach demonstrates safe, efficient traffic flow in simulation, robustness to sensor noise and traffic distributions, and zero-shot Sim-to-Real transfer to five Crazyflie drones, highlighting practical viability for future UAM operations.

Abstract

Urban air mobility is an innovative mode of transportation in which electric vertical takeoff and landing (eVTOL) vehicles operate between nodes called vertiports. We outline a self-organized vertiport arrival system based on deep reinforcement learning. The airspace around the vertiport is assumed to be circular, and the vehicles can freely operate inside. Each aircraft is considered an individual agent and follows a shared policy, resulting in decentralized actions that are based on local information. We investigate the development of the reinforcement learning policy during training and illustrate how the algorithm moves from suboptimal local holding patterns to a safe and efficient final policy. The latter is validated in simulation-based scenarios, including robustness analyses against sensor noise and a changing distribution of inbound traffic. Lastly, we deploy the final policy on small-scale unmanned aerial vehicles to showcase its real-world usability.

Self-organized free-flight arrival for urban air mobility

TL;DR

This work addresses safe and scalable terminal arrival management for urban air mobility in a circular vertiport airspace using a decentralized, self-organized approach. It introduces a shared LSTM-TD3-based policy within a CTDE MARL framework, trained via curriculum learning to handle progressively larger traffic while remaining decentralized. The observation space combines self-vehicle and surrounding-vehicle information, and the reward integrates collision avoidance, VTOL-zone entrance, airspace containment, and motion comfort. The approach demonstrates safe, efficient traffic flow in simulation, robustness to sensor noise and traffic distributions, and zero-shot Sim-to-Real transfer to five Crazyflie drones, highlighting practical viability for future UAM operations.

Abstract

Urban air mobility is an innovative mode of transportation in which electric vertical takeoff and landing (eVTOL) vehicles operate between nodes called vertiports. We outline a self-organized vertiport arrival system based on deep reinforcement learning. The airspace around the vertiport is assumed to be circular, and the vehicles can freely operate inside. Each aircraft is considered an individual agent and follows a shared policy, resulting in decentralized actions that are based on local information. We investigate the development of the reinforcement learning policy during training and illustrate how the algorithm moves from suboptimal local holding patterns to a safe and efficient final policy. The latter is validated in simulation-based scenarios, including robustness analyses against sensor noise and a changing distribution of inbound traffic. Lastly, we deploy the final policy on small-scale unmanned aerial vehicles to showcase its real-world usability.
Paper Structure (25 sections, 13 equations, 13 figures)

This paper contains 25 sections, 13 equations, 13 figures.

Figures (13)

  • Figure 1: Schematic illustration of the airspace design; inspired by bertram2020efficient.
  • Figure 1: Results of the simulation study with Policy IV under observations with small positional noise ($\sigma_{\rm N} = \unit[10]{m}$).
  • Figure 2: Neural network architecture adapted from waltz20232. The symbol $\bowtie$ denotes concatenation.
  • Figure 2: Results of the simulation study with Policy IV under observations with medium positional noise ($\sigma_{\rm N} = \unit[20]{m}$).
  • Figure 3: Training progress under different settings: 'CL' dynamically increases the number of vehicles, while 'No CL' considers 25 vehicles from the first training step. The 'CL: Selected run' is the run of the 'CL' setting which is used for the upcoming evaluations.
  • ...and 8 more figures