MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning

Seong-Hyeon Hwang; Soyoung Choi; Steven Euijong Whang

MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning

Seong-Hyeon Hwang, Soyoung Choi, Steven Euijong Whang

TL;DR

MIDAS tackles modality imbalance in multimodal learning by treating misaligned samples as informative supervision. It generates misaligned pairs and labels them with a unimodal-confidence-based soft target, then strengthens learning from weaker modalities via a dynamic weak-modality weight and prioritizes harder, more semantically ambiguous misaligned samples through hard-sample weighting. The approach yields consistent improvements over strong baselines across multiple datasets, demonstrating improved modality balance and discriminative power. This data-centric augmentation offers a practical path to robust, balanced multimodal representations with potential applicability beyond classification.

Abstract

Multimodal models often over-rely on dominant modalities, failing to achieve optimal performance. While prior work focuses on modifying training objectives or optimization procedures, data-centric solutions remain underexplored. We propose MIDAS, a novel data augmentation strategy that generates misaligned samples with semantically inconsistent cross-modal information, labeled using unimodal confidence scores to compel learning from contradictory signals. However, this confidence-based labeling can still favor the more confident modality. To address this within our misaligned samples, we introduce weak-modality weighting, which dynamically increases the loss weight of the least confident modality, thereby helping the model fully utilize weaker modality. Furthermore, when misaligned features exhibit greater similarity to the aligned features, these misaligned samples pose a greater challenge, thereby enabling the model to better distinguish between classes. To leverage this, we propose hard-sample weighting, which prioritizes such semantically ambiguous misaligned samples. Experiments on multiple multimodal classification benchmarks demonstrate that MIDAS significantly outperforms related baselines in addressing modality imbalance.

MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning

TL;DR

Abstract

MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)