Table of Contents
Fetching ...

Rebalanced Multimodal Learning with Data-aware Unimodal Sampling

Qingyuan Jiang, Zhouyang Chi, Xiao Ma, Qirong Mao, Yang Yang, Jinhui Tang

TL;DR

The paper tackles modality imbalance in multimodal learning arising from uneven unimodal data sampling. It introduces Data-aware Unimodal Sampling (DUS), which uses a cumulative modality discrepancy score $\hat{s}^{(j)}_t$ to monitor learning and guides adaptive data sampling via heuristic rules or reinforcement learning (REINFORCE). DUS is designed as a plug-in for existing MML methods and demonstrates state-of-the-art performance across diverse datasets and modalities. Empirically, both the discrepancy-based monitor and the adaptive sampling policy reduce modality gaps and improve accuracy and MAP/Macro-F1, validating the sampling-centric view of balancing multimodal learning. The work also analyzes robustness to hyper-parameters and discusses practical limitations related to data pairing and integration with certain loss structures.

Abstract

To address the modality learning degeneration caused by modality imbalance, existing multimodal learning~(MML) approaches primarily attempt to balance the optimization process of each modality from the perspective of model learning. However, almost all existing methods ignore the modality imbalance caused by unimodal data sampling, i.e., equal unimodal data sampling often results in discrepancies in informational content, leading to modality imbalance. Therefore, in this paper, we propose a novel MML approach called \underline{D}ata-aware \underline{U}nimodal \underline{S}ampling~(\method), which aims to dynamically alleviate the modality imbalance caused by sampling. Specifically, we first propose a novel cumulative modality discrepancy to monitor the multimodal learning process. Based on the learning status, we propose a heuristic and a reinforcement learning~(RL)-based data-aware unimodal sampling approaches to adaptively determine the quantity of sampled data at each iteration, thus alleviating the modality imbalance from the perspective of sampling. Meanwhile, our method can be seamlessly incorporated into almost all existing multimodal learning approaches as a plugin. Experiments demonstrate that \method~can achieve the best performance by comparing with diverse state-of-the-art~(SOTA) baselines.

Rebalanced Multimodal Learning with Data-aware Unimodal Sampling

TL;DR

The paper tackles modality imbalance in multimodal learning arising from uneven unimodal data sampling. It introduces Data-aware Unimodal Sampling (DUS), which uses a cumulative modality discrepancy score to monitor learning and guides adaptive data sampling via heuristic rules or reinforcement learning (REINFORCE). DUS is designed as a plug-in for existing MML methods and demonstrates state-of-the-art performance across diverse datasets and modalities. Empirically, both the discrepancy-based monitor and the adaptive sampling policy reduce modality gaps and improve accuracy and MAP/Macro-F1, validating the sampling-centric view of balancing multimodal learning. The work also analyzes robustness to hyper-parameters and discusses practical limitations related to data pairing and integration with certain loss structures.

Abstract

To address the modality learning degeneration caused by modality imbalance, existing multimodal learning~(MML) approaches primarily attempt to balance the optimization process of each modality from the perspective of model learning. However, almost all existing methods ignore the modality imbalance caused by unimodal data sampling, i.e., equal unimodal data sampling often results in discrepancies in informational content, leading to modality imbalance. Therefore, in this paper, we propose a novel MML approach called \underline{D}ata-aware \underline{U}nimodal \underline{S}ampling~(\method), which aims to dynamically alleviate the modality imbalance caused by sampling. Specifically, we first propose a novel cumulative modality discrepancy to monitor the multimodal learning process. Based on the learning status, we propose a heuristic and a reinforcement learning~(RL)-based data-aware unimodal sampling approaches to adaptively determine the quantity of sampled data at each iteration, thus alleviating the modality imbalance from the perspective of sampling. Meanwhile, our method can be seamlessly incorporated into almost all existing multimodal learning approaches as a plugin. Experiments demonstrate that \method~can achieve the best performance by comparing with diverse state-of-the-art~(SOTA) baselines.

Paper Structure

This paper contains 24 sections, 15 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Relationship between performance and the quantity of sampled data on Kinetics-Sounds dataset, where the rectangle and diamond markers denote the video and audio modalities, respectively. The average batch size is marked with its corresponding colors around the markers. By adjusting the batch size, we can affect modality discrepancy and thereby improve modality learning.
  • Figure 2: The architecture of our proposed DUS using the RL-based adaptive unimodal sampling as an example. DUS contains two important components, i.e., cumulative modality discrepancy calculation and adaptive unimodal sampling.
  • Figure 3: Impact of constant batch size $N_B$ and learning rate.
  • Figure 4: Change of batch size during the training process. Best viewed in color.
  • Figure 5: Change of cumulative modality discrepancy score during the training process. Best viewed in color.