Table of Contents
Fetching ...

CHATATC: Large Language Model-Driven Conversational Agents for Supporting Strategic Air Traffic Flow Management

Sinan Abdulhak, Wayne Hubbard, Karthik Gopalakrishnan, Max Z. Li

TL;DR

The paper investigates deploying large language models (LLMs) to support strategic air traffic flow management by summarizing historical Ground Delay Program (GDP) data. It builds ChatATC, an LLM-based conversational agent trained on a GDP dataset spanning 2000–2023 (~86,842 GDP issuances), and examines two training paradigms: in-prompt learning and fine-tuning (per-airport instances like SFO and EWR), along with a user interface design to enable user collaboration. Findings show ChatATC can retrieve and summarize GDP attributes (rates, durations, reasons) but faces challenges with superlative queries and exact date extraction, highlighting safety and reliability considerations in non-safety-critical ATM contexts. The work culminates in a GUI design to facilitate human-machine collaboration and outlines future directions, including baseline evaluations, broader TMI support, and expanded testing beyond GDP to other ATM actions. The study demonstrates a data-driven pathway to augment NAS situational awareness and training, while stressing careful deployment to avoid over-reliance and ensure verifiability.

Abstract

Generative artificial intelligence (AI) and large language models (LLMs) have gained rapid popularity through publicly available tools such as ChatGPT. The adoption of LLMs for personal and professional use is fueled by the natural interactions between human users and computer applications such as ChatGPT, along with powerful summarization and text generation capabilities. Given the widespread use of such generative AI tools, in this work we investigate how these tools can be deployed in a non-safety critical, strategic traffic flow management setting. Specifically, we train an LLM, CHATATC, based on a large historical data set of Ground Delay Program (GDP) issuances, spanning 2000-2023 and consisting of over 80,000 GDP implementations, revisions, and cancellations. We test the query and response capabilities of CHATATC, documenting successes (e.g., providing correct GDP rates, durations, and reason) and shortcomings (e.g,. superlative questions). We also detail the design of a graphical user interface for future users to interact and collaborate with the CHATATC conversational agent.

CHATATC: Large Language Model-Driven Conversational Agents for Supporting Strategic Air Traffic Flow Management

TL;DR

The paper investigates deploying large language models (LLMs) to support strategic air traffic flow management by summarizing historical Ground Delay Program (GDP) data. It builds ChatATC, an LLM-based conversational agent trained on a GDP dataset spanning 2000–2023 (~86,842 GDP issuances), and examines two training paradigms: in-prompt learning and fine-tuning (per-airport instances like SFO and EWR), along with a user interface design to enable user collaboration. Findings show ChatATC can retrieve and summarize GDP attributes (rates, durations, reasons) but faces challenges with superlative queries and exact date extraction, highlighting safety and reliability considerations in non-safety-critical ATM contexts. The work culminates in a GUI design to facilitate human-machine collaboration and outlines future directions, including baseline evaluations, broader TMI support, and expanded testing beyond GDP to other ATM actions. The study demonstrates a data-driven pathway to augment NAS situational awareness and training, while stressing careful deployment to avoid over-reliance and ensure verifiability.

Abstract

Generative artificial intelligence (AI) and large language models (LLMs) have gained rapid popularity through publicly available tools such as ChatGPT. The adoption of LLMs for personal and professional use is fueled by the natural interactions between human users and computer applications such as ChatGPT, along with powerful summarization and text generation capabilities. Given the widespread use of such generative AI tools, in this work we investigate how these tools can be deployed in a non-safety critical, strategic traffic flow management setting. Specifically, we train an LLM, CHATATC, based on a large historical data set of Ground Delay Program (GDP) issuances, spanning 2000-2023 and consisting of over 80,000 GDP implementations, revisions, and cancellations. We test the query and response capabilities of CHATATC, documenting successes (e.g., providing correct GDP rates, durations, and reason) and shortcomings (e.g,. superlative questions). We also detail the design of a graphical user interface for future users to interact and collaborate with the CHATATC conversational agent.
Paper Structure (16 sections, 5 figures)

This paper contains 16 sections, 5 figures.

Figures (5)

  • Figure 1: Average GDP duration from 2010 to 2023. Note that in July 2020 there was only one recorded GDP in the data set.
  • Figure 2: Percent of GDPs by airport from 2010 to 2023.
  • Figure 3: GDP rates from 2010 to 2023 for three major New York City airports. We assume GDP rates are nominally aircraft per hour; Traffic Managers may customize this to be, e.g., aircraft per 15 minutes, but this is rare.
  • Figure 4: Home page for ChatATC in GUI wireframe.
  • Figure 5: GUI output for ChatATC query with GDP parameters.