Table of Contents
Fetching ...

CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification

Pingchuan Ma, Chengshuai Zhao, Bohan Jiang, Saketh Vishnubhatla, Ujun Jeong, Alimohammad Beigi, Adrienne Raglin, Huan Liu

TL;DR

Crisis-context models struggle to generalize to unseen disasters due to entangled causal and spurious features and misaligned multimodal representations. CAMO introduces a causality-guided adversarial disentanglement module to extract domain-invariant causal signals and a unified representation learning module to align visual-textual features into a shared space, enabling single-modality DG techniques for multimodal data. The approach yields 4%–21% improvements over strong baselines on CrisisMMD and DMD under leave-one-domain-out evaluation, supported by ablations and visual analyses. This work enhances the reliability of crisis monitoring systems by grounding predictions in stable causal mechanisms across diverse disaster contexts.

Abstract

Crisis classification in social media aims to extract actionable disaster-related information from multimodal posts, which is a crucial task for enhancing situational awareness and facilitating timely emergency responses. However, the wide variation in crisis types makes achieving generalizable performance across unseen disasters a persistent challenge. Existing approaches primarily leverage deep learning to fuse textual and visual cues for crisis classification, achieving numerically plausible results under in-domain settings. However, they exhibit poor generalization across unseen crisis types because they 1. do not disentangle spurious and causal features, resulting in performance degradation under domain shift, and 2. fail to align heterogeneous modality representations within a shared space, which hinders the direct adaptation of established single-modality domain generalization (DG) techniques to the multimodal setting. To address these issues, we introduce a causality-guided multimodal domain generalization (MMDG) framework that combines adversarial disentanglement with unified representation learning for crisis classification. The adversarial objective encourages the model to disentangle and focus on domain-invariant causal features, leading to more generalizable classifications grounded in stable causal mechanisms. The unified representation aligns features from different modalities within a shared latent space, enabling single-modality DG strategies to be seamlessly extended to multimodal learning. Experiments on the different datasets demonstrate that our approach achieves the best performance in unseen disaster scenarios.

CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification

TL;DR

Crisis-context models struggle to generalize to unseen disasters due to entangled causal and spurious features and misaligned multimodal representations. CAMO introduces a causality-guided adversarial disentanglement module to extract domain-invariant causal signals and a unified representation learning module to align visual-textual features into a shared space, enabling single-modality DG techniques for multimodal data. The approach yields 4%–21% improvements over strong baselines on CrisisMMD and DMD under leave-one-domain-out evaluation, supported by ablations and visual analyses. This work enhances the reliability of crisis monitoring systems by grounding predictions in stable causal mechanisms across diverse disaster contexts.

Abstract

Crisis classification in social media aims to extract actionable disaster-related information from multimodal posts, which is a crucial task for enhancing situational awareness and facilitating timely emergency responses. However, the wide variation in crisis types makes achieving generalizable performance across unseen disasters a persistent challenge. Existing approaches primarily leverage deep learning to fuse textual and visual cues for crisis classification, achieving numerically plausible results under in-domain settings. However, they exhibit poor generalization across unseen crisis types because they 1. do not disentangle spurious and causal features, resulting in performance degradation under domain shift, and 2. fail to align heterogeneous modality representations within a shared space, which hinders the direct adaptation of established single-modality domain generalization (DG) techniques to the multimodal setting. To address these issues, we introduce a causality-guided multimodal domain generalization (MMDG) framework that combines adversarial disentanglement with unified representation learning for crisis classification. The adversarial objective encourages the model to disentangle and focus on domain-invariant causal features, leading to more generalizable classifications grounded in stable causal mechanisms. The unified representation aligns features from different modalities within a shared latent space, enabling single-modality DG strategies to be seamlessly extended to multimodal learning. Experiments on the different datasets demonstrate that our approach achieves the best performance in unseen disaster scenarios.

Paper Structure

This paper contains 21 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: (a) Model performance drops significantly when deployed on unseen disaster types. (b) Direct apply single modality DG techniques is not feasible in MMDG problem. For example, Mix-up produces inconsistent labels.
  • Figure 2: CAMO framework consists of (1) unified representation learning that disentangles and aligns modal-general features for direct use of DG methods, and (2) causality-guided adversarial disentanglement that isolates domain-invariant features to enhance domain generalization.
  • Figure 3: Causal graph showing the data-generating mechanism in disaster classification problem.
  • Figure 4: t-SNE visualization of the feature representations used by the baselines and CAMO. Features obtained by CAMO exhibit a smaller domain gap across different domains.
  • Figure 5: Grad-CAM visualization comparing CLMC and CAMO. While CLMC often focuses on spurious correlations such as background regions, CAMO consistently attends to causal, semantically relevant areas, showing improved generalization.
  • ...and 1 more figures