Table of Contents
Fetching ...

Multi-Query Focused Disaster Summarization via Instruction-Based Prompting

Philipp Seeberger, Korbinian Riedhammer

TL;DR

The paper tackles multi-stream disaster summarization by combining a two-stage retrieval pipeline (BM25 with Bo1 expansion and MonoT5 reranking) with an instruction-following LLM (LLaMA-Nuggets on LLaMA-2-13B) guided by QA-motivated prompts to extract query-relevant facts. Event nuggets are built by concatenating these facts up to a 200-character limit and scoring them by mean document relevance, enabling focused, traceable summaries per event-request pair. Empirical results show competitive performance against CrisisFACTS participants in both automatic and human evaluations, though gaps remain in recall and formatting, indicating room for improvement in prompt design and fact extraction. Overall, the work demonstrates the viability of open-source LLMs for disaster summarization and highlights practical considerations for traceability, prompt robustness, and evaluation in real-world emergency contexts.

Abstract

Automatic summarization of mass-emergency events plays a critical role in disaster management. The second edition of CrisisFACTS aims to advance disaster summarization based on multi-stream fact-finding with a focus on web sources such as Twitter, Reddit, Facebook, and Webnews. Here, participants are asked to develop systems that can extract key facts from several disaster-related events, which ultimately serve as a summary. This paper describes our method to tackle this challenging task. We follow previous work and propose to use a combination of retrieval, reranking, and an embarrassingly simple instruction-following summarization. The two-stage retrieval pipeline relies on BM25 and MonoT5, while the summarizer module is based on the open-source Large Language Model (LLM) LLaMA-13b. For summarization, we explore a Question Answering (QA)-motivated prompting approach and find the evidence useful for extracting query-relevant facts. The automatic metrics and human evaluation show strong results but also highlight the gap between open-source and proprietary systems.

Multi-Query Focused Disaster Summarization via Instruction-Based Prompting

TL;DR

The paper tackles multi-stream disaster summarization by combining a two-stage retrieval pipeline (BM25 with Bo1 expansion and MonoT5 reranking) with an instruction-following LLM (LLaMA-Nuggets on LLaMA-2-13B) guided by QA-motivated prompts to extract query-relevant facts. Event nuggets are built by concatenating these facts up to a 200-character limit and scoring them by mean document relevance, enabling focused, traceable summaries per event-request pair. Empirical results show competitive performance against CrisisFACTS participants in both automatic and human evaluations, though gaps remain in recall and formatting, indicating room for improvement in prompt design and fact extraction. Overall, the work demonstrates the viability of open-source LLMs for disaster summarization and highlights practical considerations for traceability, prompt robustness, and evaluation in real-world emergency contexts.

Abstract

Automatic summarization of mass-emergency events plays a critical role in disaster management. The second edition of CrisisFACTS aims to advance disaster summarization based on multi-stream fact-finding with a focus on web sources such as Twitter, Reddit, Facebook, and Webnews. Here, participants are asked to develop systems that can extract key facts from several disaster-related events, which ultimately serve as a summary. This paper describes our method to tackle this challenging task. We follow previous work and propose to use a combination of retrieval, reranking, and an embarrassingly simple instruction-following summarization. The two-stage retrieval pipeline relies on BM25 and MonoT5, while the summarizer module is based on the open-source Large Language Model (LLM) LLaMA-13b. For summarization, we explore a Question Answering (QA)-motivated prompting approach and find the evidence useful for extracting query-relevant facts. The automatic metrics and human evaluation show strong results but also highlight the gap between open-source and proprietary systems.
Paper Structure (22 sections, 2 equations, 4 figures, 4 tables)

This paper contains 22 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: High-level overview of our proposed system and prompting strategy. The upper part depicts the overall pipeline. We call the pipeline for each event-request-query triple separately, resulting into the final summaries for each event-request pair. The lower part illustrates the prompting strategy. We generate query-focused facts using a QA-motivated approach and concatenate the extracted facts to form the final event nuggets.
  • Figure 2: Rouge-2 and BERTScore $F1$-score (x100) results on reference summaries.
  • Figure 3: Qualitative analysis results for 30 LLM prompts and responses. Prompts: The fraction of prompts that contain at least one relevant or only irrelevant documents. Responses: The fraction of responses that contain at least one relevant generated fact or whether the output has an undesired format.
  • Figure 4: Qualitative analysis results for 59 generated facts. We show the fraction for surface issues, incorrect facts, and incomplete or incorrect citations.