Knots: A Large-Scale Multi-Agent Enhanced Expert-Annotated Dataset and LLM Prompt Optimization for NOTAM Semantic Parsing
Maoqi Liu, Quan Fang, Yang Yang, Can Zhao, Kaiquan Cai
TL;DR
This work reframes NOTAM interpretation as semantic parsing requiring domain knowledge, introducing Knots, a large expert-annotated dataset with 12,347 records across 194 FIRs. It proposes a two-stage, data-centric and model-centric framework (MDA-HDF) that first discovers potential information fields and then refines them through structured debate to balance recall and precision. Comprehensive prompts and domain-specific optimizations are systematically evaluated across multiple LLMs, revealing that 5-shot in-context learning with deterministic generation yields the best performance, and that a multi-agent setup significantly improves field discovery and parsing accuracy. The findings offer practical guidelines for automated NOTAM analysis, including dataset-driven improvements, robust prompting strategies, and cautious use of advanced reasoning to ensure safety-critical reliability in aviation applications.
Abstract
Notice to Air Missions (NOTAMs) serve as a critical channel for disseminating key flight safety information, yet their complex linguistic structures and implicit reasoning pose significant challenges for automated parsing. Existing research mainly focuses on surface-level tasks such as classification and named entity recognition, lacking deep semantic understanding. To address this gap, we propose NOTAM semantic parsing, a task emphasizing semantic inference and the integration of aviation domain knowledge to produce structured, inference-rich outputs. To support this task, we construct Knots (Knowledge and NOTAM Semantics), a high-quality dataset of 12,347 expert-annotated NOTAMs covering 194 Flight Information Regions, enhanced through a multi-agent collaborative framework for comprehensive field discovery. We systematically evaluate a wide range of prompt-engineering strategies and model-adaptation techniques, achieving substantial improvements in aviation text understanding and processing. Our experimental results demonstrate the effectiveness of the proposed approach and offer valuable insights for automated NOTAM analysis systems. Our code is available at: https://github.com/Estrellajer/Knots.
