Table of Contents
Fetching ...

Generalizing to Unseen Disaster Events: A Causal View

Philipp Seeberger, Steffen Freisinger, Tobias Bocklet, Korbinian Riedhammer

TL;DR

This work addresses the challenge of generalizing disaster-related tweet classification to unseen events by adopting a causal perspective on event-related and domain biases. It introduces a debiasing framework with a bias model to capture direct event effects, domain-aware experts to balance cross-domain attention, and masking augmentation to generate counterfactuals, all trained end-to-end and evaluated on three real-world Twitter datasets. The approach yields consistent macro-F1 gains over strong baselines and competes with instruction-following LLMs while offering significantly lower latency for real-time disaster response. Limitations include reliance on English Twitter data and bias-feature detection, with future work aimed at broader language coverage, temporal/spatial bias handling, and augmentation via LLMs. The work advances practical, causally principled bias mitigation for rapid, reliable disaster information extraction.

Abstract

Due to the rapid growth of social media platforms, these tools have become essential for monitoring information during ongoing disaster events. However, extracting valuable insights requires real-time processing of vast amounts of data. A major challenge in existing systems is their exposure to event-related biases, which negatively affects their ability to generalize to emerging events. While recent advancements in debiasing and causal learning offer promising solutions, they remain underexplored in the disaster event domain. In this work, we approach bias mitigation through a causal lens and propose a method to reduce event- and domain-related biases, enhancing generalization to future events. Our approach outperforms multiple baselines by up to +1.9% F1 and significantly improves a PLM-based classifier across three disaster classification tasks.

Generalizing to Unseen Disaster Events: A Causal View

TL;DR

This work addresses the challenge of generalizing disaster-related tweet classification to unseen events by adopting a causal perspective on event-related and domain biases. It introduces a debiasing framework with a bias model to capture direct event effects, domain-aware experts to balance cross-domain attention, and masking augmentation to generate counterfactuals, all trained end-to-end and evaluated on three real-world Twitter datasets. The approach yields consistent macro-F1 gains over strong baselines and competes with instruction-following LLMs while offering significantly lower latency for real-time disaster response. Limitations include reliance on English Twitter data and bias-feature detection, with future work aimed at broader language coverage, temporal/spatial bias handling, and augmentation via LLMs. The work advances practical, causally principled bias mitigation for rapid, reliable disaster information extraction.

Abstract

Due to the rapid growth of social media platforms, these tools have become essential for monitoring information during ongoing disaster events. However, extracting valuable insights requires real-time processing of vast amounts of data. A major challenge in existing systems is their exposure to event-related biases, which negatively affects their ability to generalize to emerging events. While recent advancements in debiasing and causal learning offer promising solutions, they remain underexplored in the disaster event domain. In this work, we approach bias mitigation through a causal lens and propose a method to reduce event- and domain-related biases, enhancing generalization to future events. Our approach outperforms multiple baselines by up to +1.9% F1 and significantly improves a PLM-based classifier across three disaster classification tasks.

Paper Structure

This paper contains 39 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the proposed framework. The masking augmentation and bias model are used only during training. During inference, the bias model is removed to obtain debiased predictions. The experts and predictor corresponds to the main model, and only a single expert’s output $R_q$ is used for final prediction.
  • Figure 2: Macro F1 scores (averaged over 25 runs) for the designed probing tasks and our focused models. The random baseline refers to the information type classification task.
  • Figure 3: Macro F1 scores for different PLM encoders. Baseline represents a simple classification head. The results are the average of five runs.