NOTAM-Evolve: A Knowledge-Guided Self-Evolving Optimization Framework with LLMs for NOTAM Interpretation
Maoqi Liu, Quan Fang, Yuhao Wu, Can Zhao, Yang Yang, Kaiquan Cai
TL;DR
NOTAM interpretation is challenged by condensed, domain-specific language requiring deep parsing that combines dynamic knowledge grounding with schema-based inference. The authors introduce NOTAM-Evolve, a self-evolving framework featuring knowledge-grounded retrieval via a knowledge graph-augmented TableRAG, iterative supervised and preference-based refinement, and multi-view inference with rewriting and voting. A new benchmark of 10,000 expert-annotated NOTAMs accompanies the framework. Empirically, NOTAM-Evolve achieves a 30.4% absolute improvement over the base LLM and approaches commercial model performance while remaining open-source, signaling practical advances for safety-critical NOTAM analysis and potential applicability to other high-stakes, data-sparse domains. The work demonstrates a scalable path to robust automated NOTAM interpretation through integrated grounding and adaptive learning.
Abstract
Accurate interpretation of Notices to Airmen (NOTAMs) is critical for aviation safety, yet their condensed and cryptic language poses significant challenges to both manual and automated processing. Existing automated systems are typically limited to shallow parsing, failing to extract the actionable intelligence needed for operational decisions. We formalize the complete interpretation task as deep parsing, a dual-reasoning challenge requiring both dynamic knowledge grounding (linking the NOTAM to evolving real-world aeronautical data) and schema-based inference (applying static domain rules to deduce operational status). To tackle this challenge, we propose NOTAM-Evolve, a self-evolving framework that enables a large language model (LLM) to autonomously master complex NOTAM interpretation. Leveraging a knowledge graph-enhanced retrieval module for data grounding, the framework introduces a closed-loop learning process where the LLM progressively improves from its own outputs, minimizing the need for extensive human-annotated reasoning traces. In conjunction with this framework, we introduce a new benchmark dataset of 10,000 expert-annotated NOTAMs. Our experiments demonstrate that NOTAM-Evolve achieves a 30.4% absolute accuracy improvement over the base LLM, establishing a new state of the art on the task of structured NOTAM interpretation.
