Multi-Query Focused Disaster Summarization via Instruction-Based Prompting
Philipp Seeberger, Korbinian Riedhammer
TL;DR
The paper tackles multi-stream disaster summarization by combining a two-stage retrieval pipeline (BM25 with Bo1 expansion and MonoT5 reranking) with an instruction-following LLM (LLaMA-Nuggets on LLaMA-2-13B) guided by QA-motivated prompts to extract query-relevant facts. Event nuggets are built by concatenating these facts up to a 200-character limit and scoring them by mean document relevance, enabling focused, traceable summaries per event-request pair. Empirical results show competitive performance against CrisisFACTS participants in both automatic and human evaluations, though gaps remain in recall and formatting, indicating room for improvement in prompt design and fact extraction. Overall, the work demonstrates the viability of open-source LLMs for disaster summarization and highlights practical considerations for traceability, prompt robustness, and evaluation in real-world emergency contexts.
Abstract
Automatic summarization of mass-emergency events plays a critical role in disaster management. The second edition of CrisisFACTS aims to advance disaster summarization based on multi-stream fact-finding with a focus on web sources such as Twitter, Reddit, Facebook, and Webnews. Here, participants are asked to develop systems that can extract key facts from several disaster-related events, which ultimately serve as a summary. This paper describes our method to tackle this challenging task. We follow previous work and propose to use a combination of retrieval, reranking, and an embarrassingly simple instruction-following summarization. The two-stage retrieval pipeline relies on BM25 and MonoT5, while the summarizer module is based on the open-source Large Language Model (LLM) LLaMA-13b. For summarization, we explore a Question Answering (QA)-motivated prompting approach and find the evidence useful for extracting query-relevant facts. The automatic metrics and human evaluation show strong results but also highlight the gap between open-source and proprietary systems.
