Table of Contents
Fetching ...

AIRHILT: A Human-in-the-Loop Testbed for Multimodal Conflict Detection in Aviation

Omar Garib, Jayaprakash D. Kambhampaty, Olivia J. Pinon Fischer, Dimitri N. Mavris

TL;DR

AIRHILT addresses the need for scalable, reproducible testing of multimodal pilot and ATC assistive systems for conflict detection. It introduces a Godot‑based simulation with synchronized radio, camera, and ADS‑B data streams, a modular architecture with interchangeable ASR, vision, and decision modules, and a scenario suite spanning terminal and en route conflicts. A reference pipeline demonstrates end‑to‑end operation using Whisper ASR, YOLO vision, ADS‑B logic, and an LLM‑based decision layer, reporting latency metrics and a time‑to‑first‑warning of roughly 7.66 s for runway overlap scenarios. The work provides open‑source artifacts and a structured framework to enable reproducible multimodal situational awareness research with pilot and controller in the loop, supporting rapid exploration of candidate assistive architectures and evaluation protocols.

Abstract

We introduce AIRHILT (Aviation Integrated Reasoning, Human-in-the-Loop Testbed), a modular and lightweight simulation environment designed to evaluate multimodal pilot and air traffic control (ATC) assistance systems for aviation conflict detection. Built on the open-source Godot engine, AIRHILT synchronizes pilot and ATC radio communications, visual scene understanding from camera streams, and ADS-B surveillance data within a unified, scalable platform. The environment supports pilot- and controller-in-the-loop interactions, providing a comprehensive scenario suite covering both terminal area and en route operational conflicts, including communication errors and procedural mistakes. AIRHILT offers standardized JSON-based interfaces that enable researchers to easily integrate, swap, and evaluate automatic speech recognition (ASR), visual detection, decision-making, and text-to-speech (TTS) models. We demonstrate AIRHILT through a reference pipeline incorporating fine-tuned Whisper ASR, YOLO-based visual detection, ADS-B-based conflict logic, and GPT-OSS-20B structured reasoning, and present preliminary results from representative runway-overlap scenarios, where the assistant achieves an average time-to-first-warning of approximately 7.7 s, with average ASR and vision latencies of approximately 5.9 s and 0.4 s, respectively. The AIRHILT environment and scenario suite are openly available, supporting reproducible research on multimodal situational awareness and conflict detection in aviation; code and scenarios are available at https://github.com/ogarib3/airhilt.

AIRHILT: A Human-in-the-Loop Testbed for Multimodal Conflict Detection in Aviation

TL;DR

AIRHILT addresses the need for scalable, reproducible testing of multimodal pilot and ATC assistive systems for conflict detection. It introduces a Godot‑based simulation with synchronized radio, camera, and ADS‑B data streams, a modular architecture with interchangeable ASR, vision, and decision modules, and a scenario suite spanning terminal and en route conflicts. A reference pipeline demonstrates end‑to‑end operation using Whisper ASR, YOLO vision, ADS‑B logic, and an LLM‑based decision layer, reporting latency metrics and a time‑to‑first‑warning of roughly 7.66 s for runway overlap scenarios. The work provides open‑source artifacts and a structured framework to enable reproducible multimodal situational awareness research with pilot and controller in the loop, supporting rapid exploration of candidate assistive architectures and evaluation protocols.

Abstract

We introduce AIRHILT (Aviation Integrated Reasoning, Human-in-the-Loop Testbed), a modular and lightweight simulation environment designed to evaluate multimodal pilot and air traffic control (ATC) assistance systems for aviation conflict detection. Built on the open-source Godot engine, AIRHILT synchronizes pilot and ATC radio communications, visual scene understanding from camera streams, and ADS-B surveillance data within a unified, scalable platform. The environment supports pilot- and controller-in-the-loop interactions, providing a comprehensive scenario suite covering both terminal area and en route operational conflicts, including communication errors and procedural mistakes. AIRHILT offers standardized JSON-based interfaces that enable researchers to easily integrate, swap, and evaluate automatic speech recognition (ASR), visual detection, decision-making, and text-to-speech (TTS) models. We demonstrate AIRHILT through a reference pipeline incorporating fine-tuned Whisper ASR, YOLO-based visual detection, ADS-B-based conflict logic, and GPT-OSS-20B structured reasoning, and present preliminary results from representative runway-overlap scenarios, where the assistant achieves an average time-to-first-warning of approximately 7.7 s, with average ASR and vision latencies of approximately 5.9 s and 0.4 s, respectively. The AIRHILT environment and scenario suite are openly available, supporting reproducible research on multimodal situational awareness and conflict detection in aviation; code and scenarios are available at https://github.com/ogarib3/airhilt.

Paper Structure

This paper contains 33 sections, 1 equation, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: AIRHILT at a glance. Cockpit view from the simulation showing the setting used for pilot‑in‑the‑loop evaluations. AIRHILT synchronizes radio traffic, vision feeds, and ADS‑B to test end‑to‑end assistive warning pipelines.
  • Figure 2: Canonical topology alternatives for assistive processing in the ATC--pilot loop. (a) ASR/SE first: audio is enhanced (SE) and/or transcribed (ASR) first; the resulting output directly informs the advisory presented to the pilot; (b) Parallel paths: raw audio reaches the pilot while a copy is processed by ASR/SE in parallel; (c) Assistant-gated fusion: ASR/SE, vision, and ADS-B are fused before any advisory is issued. All variants incorporate vision detections and ADS-B tracks within a common decision layer that outputs graded advisories to the pilot and/or controller.
  • Figure 3: Modular simulation environment and onboard assistant architecture. A) Godot-based environment with scenario orchestration (L0), actors (L1), and I/O subsystems (L2A/L2B). B) Multimodal assistant with pluggable decision engine and advisory output. C) Built-in logging for per-modality and end-to-end latencies.