Deep Reinforcement Learning for Real-Time Ground Delay Program Revision and Corresponding Flight Delay Assignments
Ke Liu, Fan Hu, Hui Lin, Xi Cheng, Jianan Chen, Jilin Song, Siyuan Feng, Gaofeng Su, Chen Zhu
TL;DR
The paper addresses optimizing Ground Delay Programs (GDP) under uncertainty in the NAS. It applies two offline reinforcement learning approaches—Behavioral Cloning ($BC$) and Conservative Q-Learning ($CQL$)—within a time-sequential GDP simulation called SAGDP_ENV, using 2019 Newark data to adjust GDP parameters. The reward combines ground delays ($GD_{t+i}$), airborne delays ($AD_{t+i}$), and terminal-area congestion with defined costs ($c_{gnd}=1$, $c_{air}=2.5$, $p=10$) over a horizon of $n=8$ intervals. Findings show learning challenges due to oversimplified environmental modeling and data limitations, highlighting the need for more faithful weather integration and broader GDP parameterization to realize practical benefits in ATM.
Abstract
This paper explores the optimization of Ground Delay Programs (GDP), a prevalent Traffic Management Initiative used in Air Traffic Management (ATM) to reconcile capacity and demand discrepancies at airports. Employing Reinforcement Learning (RL) to manage the inherent uncertainties in the national airspace system-such as weather variability, fluctuating flight demands, and airport arrival rates-we developed two RL models: Behavioral Cloning (BC) and Conservative Q-Learning (CQL). These models are designed to enhance GDP efficiency by utilizing a sophisticated reward function that integrates ground and airborne delays and terminal area congestion. We constructed a simulated single-airport environment, SAGDP_ENV, which incorporates real operational data along with predicted uncertainties to facilitate realistic decision-making scenarios. Utilizing the whole year 2019 data from Newark Liberty International Airport (EWR), our models aimed to preemptively set airport program rates. Despite thorough modeling and simulation, initial outcomes indicated that the models struggled to learn effectively, attributed potentially to oversimplified environmental assumptions. This paper discusses the challenges encountered, evaluates the models' performance against actual operational data, and outlines future directions to refine RL applications in ATM.
