Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment
Adrian-Dinu Urse, Dumitru-Clementin Cercel, Florin Pop
TL;DR
This study tackles disaster assessment from social media by addressing data scarcity and class imbalance with targeted augmentation. It evaluates diffusion-based image augmentations (Real Guidance, DiffuseMix) and text augmentations (back-translation, transformer paraphrasing, image-caption augmentation) across unimodal, multimodal, and multi-view setups on CrisisMMD. Key findings show improved performance for underrepresented classes with image augmentations and overall gains from text augmentations, though multi-view learning requires more careful alignment and training. The work provides practical augmentation strategies that enhance robustness for crisis-aware systems and informs future research on multimodal disaster analysis.
Abstract
Natural disaster assessment relies on accurate and rapid access to information, with social media emerging as a valuable real-time source. However, existing datasets suffer from class imbalance and limited samples, making effective model development a challenging task. This paper explores augmentation techniques to address these issues on the CrisisMMD multimodal dataset. For visual data, we apply diffusion-based methods, namely Real Guidance and DiffuseMix. For text data, we explore back-translation, paraphrasing with transformers, and image caption-based augmentation. We evaluated these across unimodal, multimodal, and multi-view learning setups. Results show that selected augmentations improve classification performance, particularly for underrepresented classes, while multi-view learning introduces potential but requires further refinement. This study highlights effective augmentation strategies for building more robust disaster assessment systems.
