AIRHILT: A Human-in-the-Loop Testbed for Multimodal Conflict Detection in Aviation
Omar Garib, Jayaprakash D. Kambhampaty, Olivia J. Pinon Fischer, Dimitri N. Mavris
TL;DR
AIRHILT addresses the need for scalable, reproducible testing of multimodal pilot and ATC assistive systems for conflict detection. It introduces a Godot‑based simulation with synchronized radio, camera, and ADS‑B data streams, a modular architecture with interchangeable ASR, vision, and decision modules, and a scenario suite spanning terminal and en route conflicts. A reference pipeline demonstrates end‑to‑end operation using Whisper ASR, YOLO vision, ADS‑B logic, and an LLM‑based decision layer, reporting latency metrics and a time‑to‑first‑warning of roughly 7.66 s for runway overlap scenarios. The work provides open‑source artifacts and a structured framework to enable reproducible multimodal situational awareness research with pilot and controller in the loop, supporting rapid exploration of candidate assistive architectures and evaluation protocols.
Abstract
We introduce AIRHILT (Aviation Integrated Reasoning, Human-in-the-Loop Testbed), a modular and lightweight simulation environment designed to evaluate multimodal pilot and air traffic control (ATC) assistance systems for aviation conflict detection. Built on the open-source Godot engine, AIRHILT synchronizes pilot and ATC radio communications, visual scene understanding from camera streams, and ADS-B surveillance data within a unified, scalable platform. The environment supports pilot- and controller-in-the-loop interactions, providing a comprehensive scenario suite covering both terminal area and en route operational conflicts, including communication errors and procedural mistakes. AIRHILT offers standardized JSON-based interfaces that enable researchers to easily integrate, swap, and evaluate automatic speech recognition (ASR), visual detection, decision-making, and text-to-speech (TTS) models. We demonstrate AIRHILT through a reference pipeline incorporating fine-tuned Whisper ASR, YOLO-based visual detection, ADS-B-based conflict logic, and GPT-OSS-20B structured reasoning, and present preliminary results from representative runway-overlap scenarios, where the assistant achieves an average time-to-first-warning of approximately 7.7 s, with average ASR and vision latencies of approximately 5.9 s and 0.4 s, respectively. The AIRHILT environment and scenario suite are openly available, supporting reproducible research on multimodal situational awareness and conflict detection in aviation; code and scenarios are available at https://github.com/ogarib3/airhilt.
