Table of Contents
Fetching ...

Monitoring Critical Infrastructure Facilities During Disasters Using Large Language Models

Abdul Wahab Ziaullah, Ferda Ofli, Muhammad Imran

TL;DR

This work investigates using large language models to monitor critical infrastructure facilities during disasters by analyzing social media signals. It proposes a two-pipeline methodology: (1) data collection/indexing using OpenStreetMap CIFs and synthetic/real tweets with embeddings, and (2) retrieval, classification, and inference of CIF impact, severity, and operational status via zero-shot LLMs. The study evaluates retrieval and classification performance across two disaster AOIs and finds that while LLMs show solid classification capability, inference under noise and lengthy prompts remains challenging, with low overall retrieval effectiveness. The findings point to the potential of LLM-based, zero-shot disaster monitoring while outlining practical improvements in retrieval quality, data realism, and multi-model integration for real-world adoption.

Abstract

Critical Infrastructure Facilities (CIFs), such as healthcare and transportation facilities, are vital for the functioning of a community, especially during large-scale emergencies. In this paper, we explore a potential application of Large Language Models (LLMs) to monitor the status of CIFs affected by natural disasters through information disseminated in social media networks. To this end, we analyze social media data from two disaster events in two different countries to identify reported impacts to CIFs as well as their impact severity and operational status. We employ state-of-the-art open-source LLMs to perform computational tasks including retrieval, classification, and inference, all in a zero-shot setting. Through extensive experimentation, we report the results of these tasks using standard evaluation metrics and reveal insights into the strengths and weaknesses of LLMs. We note that although LLMs perform well in classification tasks, they encounter challenges with inference tasks, especially when the context/prompt is complex and lengthy. Additionally, we outline various potential directions for future exploration that can be beneficial during the initial adoption phase of LLMs for disaster response tasks.

Monitoring Critical Infrastructure Facilities During Disasters Using Large Language Models

TL;DR

This work investigates using large language models to monitor critical infrastructure facilities during disasters by analyzing social media signals. It proposes a two-pipeline methodology: (1) data collection/indexing using OpenStreetMap CIFs and synthetic/real tweets with embeddings, and (2) retrieval, classification, and inference of CIF impact, severity, and operational status via zero-shot LLMs. The study evaluates retrieval and classification performance across two disaster AOIs and finds that while LLMs show solid classification capability, inference under noise and lengthy prompts remains challenging, with low overall retrieval effectiveness. The findings point to the potential of LLM-based, zero-shot disaster monitoring while outlining practical improvements in retrieval quality, data realism, and multi-model integration for real-world adoption.

Abstract

Critical Infrastructure Facilities (CIFs), such as healthcare and transportation facilities, are vital for the functioning of a community, especially during large-scale emergencies. In this paper, we explore a potential application of Large Language Models (LLMs) to monitor the status of CIFs affected by natural disasters through information disseminated in social media networks. To this end, we analyze social media data from two disaster events in two different countries to identify reported impacts to CIFs as well as their impact severity and operational status. We employ state-of-the-art open-source LLMs to perform computational tasks including retrieval, classification, and inference, all in a zero-shot setting. Through extensive experimentation, we report the results of these tasks using standard evaluation metrics and reveal insights into the strengths and weaknesses of LLMs. We note that although LLMs perform well in classification tasks, they encounter challenges with inference tasks, especially when the context/prompt is complex and lengthy. Additionally, we outline various potential directions for future exploration that can be beneficial during the initial adoption phase of LLMs for disaster response tasks.
Paper Structure (18 sections, 3 equations, 7 figures, 6 tables)

This paper contains 18 sections, 3 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: High-level methodology detailing two pipelines: (i) data generation, collection, and indexing, (ii) data retrieval, classification, and analysis
  • Figure 2: Distribution of impact labels in the synthetic data for (a) Broward County and (b) Christchurch. The outer charts with light blue bars correspond to the LLM-generated raw tags whereas the overlaid smaller charts with dark blue bars show the manually pruned ground-truth tags.
  • Figure 3: Performance comparison of retrieval queries for Broward County
  • Figure 4: Performance comparison of retrieval queries for Christchurch
  • Figure 5: Signal-to-noise distribution of retrieved tweets for each CIF in (a) Broward County and (b) Christchurch
  • ...and 2 more figures