Table of Contents
Fetching ...

Signals from the Floods: AI-Driven Disaster Analysis through Multi-Source Data Fusion

Xian Gong, Paul X. McCarthy, Lin Tian, Marian-Andrei Rizoiu

TL;DR

The paper tackles extracting actionable crisis insights from noisy social media and structured public submissions during floods by fusing data sources and applying ML-based filtering. It introduces an integrated framework using Latent Dirichlet Allocation for topic discovery, LongT5 embeddings for cross-source alignment, and a Relevance Index to prioritize flood-relevant tweets. The study reveals distinct discourse between public submissions and social media, and demonstrates that relevance filtering enhances real-time situational awareness and informs policy planning. The approach offers practical value for emergency responders and resilience planners, and points to future work including richer multimodal analyses and dataset expansion.

Abstract

Massive and diverse web data are increasingly vital for government disaster response, as demonstrated by the 2022 floods in New South Wales (NSW), Australia. This study examines how X (formerly Twitter) and public inquiry submissions provide insights into public behaviour during crises. We analyse more than 55,000 flood-related tweets and 1,450 submissions to identify behavioural patterns during extreme weather events. While social media posts are short and fragmented, inquiry submissions are detailed, multi-page documents offering structured insights. Our methodology integrates Latent Dirichlet Allocation (LDA) for topic modelling with Large Language Models (LLMs) to enhance semantic understanding. LDA reveals distinct opinions and geographic patterns, while LLMs improve filtering by identifying flood-relevant tweets using public submissions as a reference. This Relevance Index method reduces noise and prioritizes actionable content, improving situational awareness for emergency responders. By combining these complementary data streams, our approach introduces a novel AI-driven method to refine crisis-related social media content, improve real-time disaster response, and inform long-term resilience planning.

Signals from the Floods: AI-Driven Disaster Analysis through Multi-Source Data Fusion

TL;DR

The paper tackles extracting actionable crisis insights from noisy social media and structured public submissions during floods by fusing data sources and applying ML-based filtering. It introduces an integrated framework using Latent Dirichlet Allocation for topic discovery, LongT5 embeddings for cross-source alignment, and a Relevance Index to prioritize flood-relevant tweets. The study reveals distinct discourse between public submissions and social media, and demonstrates that relevance filtering enhances real-time situational awareness and informs policy planning. The approach offers practical value for emergency responders and resilience planners, and points to future work including richer multimodal analyses and dataset expansion.

Abstract

Massive and diverse web data are increasingly vital for government disaster response, as demonstrated by the 2022 floods in New South Wales (NSW), Australia. This study examines how X (formerly Twitter) and public inquiry submissions provide insights into public behaviour during crises. We analyse more than 55,000 flood-related tweets and 1,450 submissions to identify behavioural patterns during extreme weather events. While social media posts are short and fragmented, inquiry submissions are detailed, multi-page documents offering structured insights. Our methodology integrates Latent Dirichlet Allocation (LDA) for topic modelling with Large Language Models (LLMs) to enhance semantic understanding. LDA reveals distinct opinions and geographic patterns, while LLMs improve filtering by identifying flood-relevant tweets using public submissions as a reference. This Relevance Index method reduces noise and prioritizes actionable content, improving situational awareness for emergency responders. By combining these complementary data streams, our approach introduces a novel AI-driven method to refine crisis-related social media content, improve real-time disaster response, and inform long-term resilience planning.

Paper Structure

This paper contains 12 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Visualisation of tweet and public submission embeddings using 2D UMAP. (a) Grey points represent 55,724 tweets, while colored points indicate 1,450 public submissions. The black circle highlights the core cluster of submissions, which serve as a reference corpus for relevance filtering. (b) Topic distribution of public submissions within the black circle, categorized into Home & Family, Public Concerns, Attribution & Recovery, and Environmental Governance. For detailed topic descriptions see Table \ref{['table:topic_modelling_tab']}.
  • Figure 1: Geographical distribution of public submission by topics. Find detailed topic description in Table \ref{['table:topic_modelling_tab']}
  • Figure 2: Submission topics by respondent type. The figure illustrates the proportion of each topic—Home & Family, Public Concerns, Attribution & Recovery, and Environmental Governance—across different categories of submitters, including flood-affected residents, business owners, emergency personnel, and academics.
  • Figure 2: Geographical distribution of social media tweets by topics. Find detailed topic description in Table \ref{['table:topic_modelling_tab']}
  • Figure 3: Topic distribution of daily tweet volume. Initially, weather updates and rescue efforts dominated, peaking during major flood events. Over time, discussions shifted toward long-term recovery, political debates on future preparedness, and increasing public sentiment, while real-time news coverage declined.
  • ...and 2 more figures