Table of Contents
Fetching ...

World2Rules: A Neuro-Symbolic Framework for Learning World-Governing Safety Rules for Aviation

Haichuan Wang, Jay Patrikar, Sebastian Scherer

Abstract

Many real-world safety-critical systems are governed by explicit rules that define unsafe world configurations and constrain agent interactions. In practice, these rules are complex and context-dependent, making manual specification incomplete and error-prone. Learning such rules from real-world multimodal data is further challenged by noise, inconsistency, and sparse failure cases. Neural models can extract structure from text and visual data but lack formal guarantees, while symbolic methods provide verifiability yet are brittle when applied directly to imperfect observations. We present World2Rules, a neuro-symbolic framework for learning world-governing safety rules from real-world multimodal aviation data. World2Rules learns from both nominal operational data and aviation crash and incident reports, treating neural models as proposal mechanisms for candidate symbolic facts and inductive logic programming as a verification layer. The framework employs hierarchical reflective reasoning, enforcing consistency across examples, subsets, and rules to filter unreliable evidence, aggregate only mutually consistent components, and prune unsupported hypotheses. This design limits error propagation from noisy neural extractions and yields compact, interpretable first-order logic rules that characterize unsafe world configurations. We evaluate World2Rules on real-world aviation safety data and show that it learns rules that achieve 23.6% higher F1 score than purely neural and 43.2% higher F1 score than single-pass neuro-symbolic baseline, while remaining suitable for safety-critical reasoning and formal analysis.

World2Rules: A Neuro-Symbolic Framework for Learning World-Governing Safety Rules for Aviation

Abstract

Many real-world safety-critical systems are governed by explicit rules that define unsafe world configurations and constrain agent interactions. In practice, these rules are complex and context-dependent, making manual specification incomplete and error-prone. Learning such rules from real-world multimodal data is further challenged by noise, inconsistency, and sparse failure cases. Neural models can extract structure from text and visual data but lack formal guarantees, while symbolic methods provide verifiability yet are brittle when applied directly to imperfect observations. We present World2Rules, a neuro-symbolic framework for learning world-governing safety rules from real-world multimodal aviation data. World2Rules learns from both nominal operational data and aviation crash and incident reports, treating neural models as proposal mechanisms for candidate symbolic facts and inductive logic programming as a verification layer. The framework employs hierarchical reflective reasoning, enforcing consistency across examples, subsets, and rules to filter unreliable evidence, aggregate only mutually consistent components, and prune unsupported hypotheses. This design limits error propagation from noisy neural extractions and yields compact, interpretable first-order logic rules that characterize unsafe world configurations. We evaluate World2Rules on real-world aviation safety data and show that it learns rules that achieve 23.6% higher F1 score than purely neural and 43.2% higher F1 score than single-pass neuro-symbolic baseline, while remaining suitable for safety-critical reasoning and formal analysis.

Paper Structure

This paper contains 38 sections, 2 theorems, 15 equations, 6 figures, 2 algorithms.

Key Result

theorem 1

Let $(E^{+}, E^{-})$ denote the aggregated positive and negative examples accepted by the four-level feedback mechanism, and let $H_{\mathrm{agg}}$ be the hypothesis produced at the end of Level 3 (before pruning). If every rule admitted during aggregation is negative-safe, then $H_{\mathrm{agg}}$ i

Figures (6)

  • Figure 1: Overview of World2Rules framework. Textual incident reports and visual trajectory data are processed by Large Language Model (LLM) and Vision Language Model (VLM) based extractors to generate Inductive Logic Programming (ILP) inputs, which are decomposed into independent ILP subproblems. Consistency-driven feedback is applied at the example, subset, and rule levels to filter, aggregate, and prune hypotheses, yielding compact and interpretable symbolic safety rules.
  • Figure 2: Extraction pipeline for converting crash reports into ILP inputs. Each report is augmented with airport metadata, parsed by an LLM into typed entities and relations, and converted into background knowledge, bias, and example files. Validation ensures syntactic and semantic consistency before downstream learning.
  • Figure 3: Performance comparison across system variants using 300 violation reports. World2Rules achieves 94.0% F1, outperforming the LLM-only baseline (70.4%) and naïve ILP (50.8%) while maintaining perfect precision.
  • Figure 4: Data scaling for World2Rules. F1 score improves from 52.5% (10 reports) to 94.0% (300 reports), driven by recall gains while precision remains near 100%.
  • Figure 5: Sample collision rules learned by World2Rules. Each rule captures a distinct runway incursion pattern: holding on an active runway (Rule 1), crossing during landing (Rule 2), and co-occupation of a runway's extended area (Rule 3).
  • ...and 1 more figures

Theorems & Definitions (5)

  • definition 1: Constructed ILP Instance
  • theorem 1: Pre-Pruning Training Correctness
  • proof
  • proposition 1: Safety Preservation under Support Pruning
  • proof