Table of Contents
Fetching ...

Knowledge-Grounded Agentic Large Language Models for Multi-Hazard Understanding from Reconnaissance Reports

Chenchen Kuai, Zihao Li, Braden Rosen, Stephanie Paal, Navid Jafari, Jean-Louis Briaud, Yunlong Zhang, Youssef M. A. Hashash, Yang Zhou

TL;DR

This paper tackles the challenge of extracting trustworthy, hazard-grounded knowledge from unstructured post-disaster reconnaissance reports. It introduces HazardRecQA, a large dataset of QA pairs derived from GEER reconnaissance reports, and MoRA-RAG, a knowledge-grounded retrieval-augmented framework with a mixture-of-retriever routing and an agentic loop for validation and refinement. Empirical results show MoRA-RAG significantly improves accuracy over zero-shot and standard RAG baselines, with open-weight LLMs achieving performance close to proprietary models when grounded properly. The work lays a foundation for reliable, domain-specific reasoning across multiple hazards and points to future multimodal extensions and adaptive learning for resilience analysis.

Abstract

Post-disaster reconnaissance reports contain critical evidence for understanding multi-hazard interactions, yet their unstructured narratives make systematic knowledge transfer difficult. Large language models (LLMs) offer new potential for analyzing these reports, but often generate unreliable or hallucinated outputs when domain grounding is absent. This study introduces the Mixture-of-Retrieval Agentic RAG (MoRA-RAG), a knowledge-grounded LLM framework that transforms reconnaissance reports into a structured foundation for multi-hazard reasoning. The framework integrates a Mixture-of-Retrieval mechanism that dynamically routes queries across hazard-specific databases while using agentic chunking to preserve contextual coherence during retrieval. It also includes a verification loop that assesses evidence sufficiency, refines queries, and initiates targeted searches when information remains incomplete. We construct HazardRecQA by deriving question-answer pairs from GEER reconnaissance reports, which document 90 global events across seven major hazard types. MoRA-RAG achieves up to 94.5 percent accuracy, outperforming zero-shot LLMs by 30 percent and state-of-the-art RAG systems by 10 percent, while reducing hallucinations across diverse LLM architectures. MoRA-RAG also enables open-weight LLMs to achieve performance comparable to proprietary models. It establishes a new paradigm for transforming post-disaster documentation into actionable, trustworthy intelligence for hazard resilience.

Knowledge-Grounded Agentic Large Language Models for Multi-Hazard Understanding from Reconnaissance Reports

TL;DR

This paper tackles the challenge of extracting trustworthy, hazard-grounded knowledge from unstructured post-disaster reconnaissance reports. It introduces HazardRecQA, a large dataset of QA pairs derived from GEER reconnaissance reports, and MoRA-RAG, a knowledge-grounded retrieval-augmented framework with a mixture-of-retriever routing and an agentic loop for validation and refinement. Empirical results show MoRA-RAG significantly improves accuracy over zero-shot and standard RAG baselines, with open-weight LLMs achieving performance close to proprietary models when grounded properly. The work lays a foundation for reliable, domain-specific reasoning across multiple hazards and points to future multimodal extensions and adaptive learning for resilience analysis.

Abstract

Post-disaster reconnaissance reports contain critical evidence for understanding multi-hazard interactions, yet their unstructured narratives make systematic knowledge transfer difficult. Large language models (LLMs) offer new potential for analyzing these reports, but often generate unreliable or hallucinated outputs when domain grounding is absent. This study introduces the Mixture-of-Retrieval Agentic RAG (MoRA-RAG), a knowledge-grounded LLM framework that transforms reconnaissance reports into a structured foundation for multi-hazard reasoning. The framework integrates a Mixture-of-Retrieval mechanism that dynamically routes queries across hazard-specific databases while using agentic chunking to preserve contextual coherence during retrieval. It also includes a verification loop that assesses evidence sufficiency, refines queries, and initiates targeted searches when information remains incomplete. We construct HazardRecQA by deriving question-answer pairs from GEER reconnaissance reports, which document 90 global events across seven major hazard types. MoRA-RAG achieves up to 94.5 percent accuracy, outperforming zero-shot LLMs by 30 percent and state-of-the-art RAG systems by 10 percent, while reducing hallucinations across diverse LLM architectures. MoRA-RAG also enables open-weight LLMs to achieve performance comparable to proprietary models. It establishes a new paradigm for transforming post-disaster documentation into actionable, trustworthy intelligence for hazard resilience.

Paper Structure

This paper contains 23 sections, 11 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Coverage and event distribution of GEER reconnaissance reports GEER2025reportsdb.
  • Figure 1: Question type distribution across task categories, TF (True/False) and MC(Multiple-Choice).
  • Figure 2: Proportional distribution of total valid QA across hazard types (total QA).
  • Figure 3: RAG systems explained, it is able to extract and transfer hazard knowledge from external databases.
  • Figure 4: Architecture of the proposed MoRA-RAG framework compared with Vanilla RAG.
  • ...and 1 more figures