Graph Learning-based Fleet Scheduling for Urban Air Mobility under Operational Constraints, Varying Demand & Uncertainties
Steve Paul, Jhoel Witter, Souma Chowdhury
TL;DR
Addressing online fleet scheduling for urban air mobility with a multi-vertiport network and uncertainties, the paper formulates the problem as an MDP with graph-structured state representations. It introduces CapTAIN, a centralized encoder-decoder policy that fuses Graph Capsule Convolutional Networks for state embeddings, Transformer encoders for demand and fare forecasting, and a Multi-head Attention decoder, trained via PPO. The approach is integrated with a custom simulation environment and evaluated against non-learning baselines, showing superior performance and up to 1000× faster execution than a genetic algorithm, while generalizing to unseen scenarios. The work demonstrates the viability of graph-based RL with time-series context for scalable, online UAM planning under realistic constraints.
Abstract
This paper develops a graph reinforcement learning approach to online planning of the schedule and destinations of electric aircraft that comprise an urban air mobility (UAM) fleet operating across multiple vertiports. This fleet scheduling problem is formulated to consider time-varying demand, constraints related to vertiport capacity, aircraft capacity and airspace safety guidelines, uncertainties related to take-off delay, weather-induced route closures, and unanticipated aircraft downtime. Collectively, such a formulation presents greater complexity, and potentially increased realism, than in existing UAM fleet planning implementations. To address these complexities, a new policy architecture is constructed, primary components of which include: graph capsule conv-nets for encoding vertiport and aircraft-fleet states both abstracted as graphs; transformer layers encoding time series information on demand and passenger fare; and a Multi-head Attention-based decoder that uses the encoded information to compute the probability of selecting each available destination for an aircraft. Trained with Proximal Policy Optimization, this policy architecture shows significantly better performance in terms of daily averaged profits on unseen test scenarios involving 8 vertiports and 40 aircraft, when compared to a random baseline and genetic algorithm-derived optimal solutions, while being nearly 1000 times faster in execution than the latter.
