Table of Contents
Fetching ...

NOTAM-Evolve: A Knowledge-Guided Self-Evolving Optimization Framework with LLMs for NOTAM Interpretation

Maoqi Liu, Quan Fang, Yuhao Wu, Can Zhao, Yang Yang, Kaiquan Cai

TL;DR

NOTAM interpretation is challenged by condensed, domain-specific language requiring deep parsing that combines dynamic knowledge grounding with schema-based inference. The authors introduce NOTAM-Evolve, a self-evolving framework featuring knowledge-grounded retrieval via a knowledge graph-augmented TableRAG, iterative supervised and preference-based refinement, and multi-view inference with rewriting and voting. A new benchmark of 10,000 expert-annotated NOTAMs accompanies the framework. Empirically, NOTAM-Evolve achieves a 30.4% absolute improvement over the base LLM and approaches commercial model performance while remaining open-source, signaling practical advances for safety-critical NOTAM analysis and potential applicability to other high-stakes, data-sparse domains. The work demonstrates a scalable path to robust automated NOTAM interpretation through integrated grounding and adaptive learning.

Abstract

Accurate interpretation of Notices to Airmen (NOTAMs) is critical for aviation safety, yet their condensed and cryptic language poses significant challenges to both manual and automated processing. Existing automated systems are typically limited to shallow parsing, failing to extract the actionable intelligence needed for operational decisions. We formalize the complete interpretation task as deep parsing, a dual-reasoning challenge requiring both dynamic knowledge grounding (linking the NOTAM to evolving real-world aeronautical data) and schema-based inference (applying static domain rules to deduce operational status). To tackle this challenge, we propose NOTAM-Evolve, a self-evolving framework that enables a large language model (LLM) to autonomously master complex NOTAM interpretation. Leveraging a knowledge graph-enhanced retrieval module for data grounding, the framework introduces a closed-loop learning process where the LLM progressively improves from its own outputs, minimizing the need for extensive human-annotated reasoning traces. In conjunction with this framework, we introduce a new benchmark dataset of 10,000 expert-annotated NOTAMs. Our experiments demonstrate that NOTAM-Evolve achieves a 30.4% absolute accuracy improvement over the base LLM, establishing a new state of the art on the task of structured NOTAM interpretation.

NOTAM-Evolve: A Knowledge-Guided Self-Evolving Optimization Framework with LLMs for NOTAM Interpretation

TL;DR

NOTAM interpretation is challenged by condensed, domain-specific language requiring deep parsing that combines dynamic knowledge grounding with schema-based inference. The authors introduce NOTAM-Evolve, a self-evolving framework featuring knowledge-grounded retrieval via a knowledge graph-augmented TableRAG, iterative supervised and preference-based refinement, and multi-view inference with rewriting and voting. A new benchmark of 10,000 expert-annotated NOTAMs accompanies the framework. Empirically, NOTAM-Evolve achieves a 30.4% absolute improvement over the base LLM and approaches commercial model performance while remaining open-source, signaling practical advances for safety-critical NOTAM analysis and potential applicability to other high-stakes, data-sparse domains. The work demonstrates a scalable path to robust automated NOTAM interpretation through integrated grounding and adaptive learning.

Abstract

Accurate interpretation of Notices to Airmen (NOTAMs) is critical for aviation safety, yet their condensed and cryptic language poses significant challenges to both manual and automated processing. Existing automated systems are typically limited to shallow parsing, failing to extract the actionable intelligence needed for operational decisions. We formalize the complete interpretation task as deep parsing, a dual-reasoning challenge requiring both dynamic knowledge grounding (linking the NOTAM to evolving real-world aeronautical data) and schema-based inference (applying static domain rules to deduce operational status). To tackle this challenge, we propose NOTAM-Evolve, a self-evolving framework that enables a large language model (LLM) to autonomously master complex NOTAM interpretation. Leveraging a knowledge graph-enhanced retrieval module for data grounding, the framework introduces a closed-loop learning process where the LLM progressively improves from its own outputs, minimizing the need for extensive human-annotated reasoning traces. In conjunction with this framework, we introduce a new benchmark dataset of 10,000 expert-annotated NOTAMs. Our experiments demonstrate that NOTAM-Evolve achieves a 30.4% absolute accuracy improvement over the base LLM, establishing a new state of the art on the task of structured NOTAM interpretation.

Paper Structure

This paper contains 24 sections, 9 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: An illustration of a NOTAM and its information parsing process. A NOTAM is a safety alert that reports flight hazards, and its original format ('Example Input') is unstructured text. Information parsing aims to convert this into structured data ('Example Output'). Past 'Shallow Parsing' approaches used techniques such as regular expressions or traditional NER. In contrast, 'Deep Parsing', as we define it, requires a model to combine domain knowledge with dynamic data for complex reasoning to understand the text's deep semantics.
  • Figure 2: Overall framework of our proposed NOTAM-Evolve: (1) Knowledge-Grounded Retrieval: The final outputs are grounded in a set of base tables that represent real-world conditions, e.g., the number of runways at an airport. (2) Self-Optimizing Model Refinement: Our foundational model gains proficiency in handling complex instructions within NOTAM analysis scenarios through iterative self-evolution combining supervised and preference optimization. (3) Multi-View Inference with Rewriting & Voting: We rephrase the original NOTAM without altering its core content and then extract information from multiple texts to determine the final answer via a voting mechanism.
  • Figure 3: Iterative Optimization Performance (Accuracy %) across NOTAM Categories.