Table of Contents
Fetching ...

Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment

Adrian-Dinu Urse, Dumitru-Clementin Cercel, Florin Pop

TL;DR

This study tackles disaster assessment from social media by addressing data scarcity and class imbalance with targeted augmentation. It evaluates diffusion-based image augmentations (Real Guidance, DiffuseMix) and text augmentations (back-translation, transformer paraphrasing, image-caption augmentation) across unimodal, multimodal, and multi-view setups on CrisisMMD. Key findings show improved performance for underrepresented classes with image augmentations and overall gains from text augmentations, though multi-view learning requires more careful alignment and training. The work provides practical augmentation strategies that enhance robustness for crisis-aware systems and informs future research on multimodal disaster analysis.

Abstract

Natural disaster assessment relies on accurate and rapid access to information, with social media emerging as a valuable real-time source. However, existing datasets suffer from class imbalance and limited samples, making effective model development a challenging task. This paper explores augmentation techniques to address these issues on the CrisisMMD multimodal dataset. For visual data, we apply diffusion-based methods, namely Real Guidance and DiffuseMix. For text data, we explore back-translation, paraphrasing with transformers, and image caption-based augmentation. We evaluated these across unimodal, multimodal, and multi-view learning setups. Results show that selected augmentations improve classification performance, particularly for underrepresented classes, while multi-view learning introduces potential but requires further refinement. This study highlights effective augmentation strategies for building more robust disaster assessment systems.

Multimodal Learning with Augmentation Techniques for Natural Disaster Assessment

TL;DR

This study tackles disaster assessment from social media by addressing data scarcity and class imbalance with targeted augmentation. It evaluates diffusion-based image augmentations (Real Guidance, DiffuseMix) and text augmentations (back-translation, transformer paraphrasing, image-caption augmentation) across unimodal, multimodal, and multi-view setups on CrisisMMD. Key findings show improved performance for underrepresented classes with image augmentations and overall gains from text augmentations, though multi-view learning requires more careful alignment and training. The work provides practical augmentation strategies that enhance robustness for crisis-aware systems and informs future research on multimodal disaster analysis.

Abstract

Natural disaster assessment relies on accurate and rapid access to information, with social media emerging as a valuable real-time source. However, existing datasets suffer from class imbalance and limited samples, making effective model development a challenging task. This paper explores augmentation techniques to address these issues on the CrisisMMD multimodal dataset. For visual data, we apply diffusion-based methods, namely Real Guidance and DiffuseMix. For text data, we explore back-translation, paraphrasing with transformers, and image caption-based augmentation. We evaluated these across unimodal, multimodal, and multi-view learning setups. Results show that selected augmentations improve classification performance, particularly for underrepresented classes, while multi-view learning introduces potential but requires further refinement. This study highlights effective augmentation strategies for building more robust disaster assessment systems.

Paper Structure

This paper contains 27 sections, 1 figure, 14 tables.

Figures (1)

  • Figure 1: Our multi-view learning architecture.