CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification
Pingchuan Ma, Chengshuai Zhao, Bohan Jiang, Saketh Vishnubhatla, Ujun Jeong, Alimohammad Beigi, Adrienne Raglin, Huan Liu
TL;DR
Crisis-context models struggle to generalize to unseen disasters due to entangled causal and spurious features and misaligned multimodal representations. CAMO introduces a causality-guided adversarial disentanglement module to extract domain-invariant causal signals and a unified representation learning module to align visual-textual features into a shared space, enabling single-modality DG techniques for multimodal data. The approach yields 4%–21% improvements over strong baselines on CrisisMMD and DMD under leave-one-domain-out evaluation, supported by ablations and visual analyses. This work enhances the reliability of crisis monitoring systems by grounding predictions in stable causal mechanisms across diverse disaster contexts.
Abstract
Crisis classification in social media aims to extract actionable disaster-related information from multimodal posts, which is a crucial task for enhancing situational awareness and facilitating timely emergency responses. However, the wide variation in crisis types makes achieving generalizable performance across unseen disasters a persistent challenge. Existing approaches primarily leverage deep learning to fuse textual and visual cues for crisis classification, achieving numerically plausible results under in-domain settings. However, they exhibit poor generalization across unseen crisis types because they 1. do not disentangle spurious and causal features, resulting in performance degradation under domain shift, and 2. fail to align heterogeneous modality representations within a shared space, which hinders the direct adaptation of established single-modality domain generalization (DG) techniques to the multimodal setting. To address these issues, we introduce a causality-guided multimodal domain generalization (MMDG) framework that combines adversarial disentanglement with unified representation learning for crisis classification. The adversarial objective encourages the model to disentangle and focus on domain-invariant causal features, leading to more generalizable classifications grounded in stable causal mechanisms. The unified representation aligns features from different modalities within a shared latent space, enabling single-modality DG strategies to be seamlessly extended to multimodal learning. Experiments on the different datasets demonstrate that our approach achieves the best performance in unseen disaster scenarios.
