Table of Contents
Fetching ...

Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion

QingYuan Jiang, Longfei Huang, Yang Yang

TL;DR

This work tackles modality imbalance in multimodal learning by focusing on disparities in classification ability across modalities. It introduces sustained boosting to jointly minimize classification and residual errors, along with an adaptive classifier assignment mechanism to dynamically strengthen the weak modality. A theoretical result shows the cross-modal loss gap ${\mathcal G}(\Phi)$ converges at rate ${\mathcal O}(1/T)$ under standard smoothness and strong convexity assumptions, providing convergence guarantees for the proposed boosting scheme. Empirically, the method delivers state-of-the-art performance across six diverse multimodal datasets, with good robustness to hyperparameters and modality missing scenarios, and the authors release code for reproducibility.

Abstract

Multimodal learning (MML) is significantly constrained by modality imbalance, leading to suboptimal performance in practice. While existing approaches primarily focus on balancing the learning of different modalities to address this issue, they fundamentally overlook the inherent disproportion in model classification ability, which serves as the primary cause of this phenomenon. In this paper, we propose a novel multimodal learning approach to dynamically balance the classification ability of weak and strong modalities by incorporating the principle of boosting. Concretely, we first propose a sustained boosting algorithm in multimodal learning by simultaneously optimizing the classification and residual errors. Subsequently, we introduce an adaptive classifier assignment strategy to dynamically facilitate the classification performance of the weak modality. Furthermore, we theoretically analyze the convergence property of the cross-modal gap function, ensuring the effectiveness of the proposed boosting scheme. To this end, the classification ability of strong and weak modalities is expected to be balanced, thereby mitigating the imbalance issue. Empirical experiments on widely used datasets reveal the superiority of our method through comparison with various state-of-the-art (SOTA) multimodal learning baselines. The source code is available at https://github.com/njustkmg/NeurIPS25-AUG.

Rethinking Multimodal Learning from the Perspective of Mitigating Classification Ability Disproportion

TL;DR

This work tackles modality imbalance in multimodal learning by focusing on disparities in classification ability across modalities. It introduces sustained boosting to jointly minimize classification and residual errors, along with an adaptive classifier assignment mechanism to dynamically strengthen the weak modality. A theoretical result shows the cross-modal loss gap converges at rate under standard smoothness and strong convexity assumptions, providing convergence guarantees for the proposed boosting scheme. Empirically, the method delivers state-of-the-art performance across six diverse multimodal datasets, with good robustness to hyperparameters and modality missing scenarios, and the authors release code for reproducibility.

Abstract

Multimodal learning (MML) is significantly constrained by modality imbalance, leading to suboptimal performance in practice. While existing approaches primarily focus on balancing the learning of different modalities to address this issue, they fundamentally overlook the inherent disproportion in model classification ability, which serves as the primary cause of this phenomenon. In this paper, we propose a novel multimodal learning approach to dynamically balance the classification ability of weak and strong modalities by incorporating the principle of boosting. Concretely, we first propose a sustained boosting algorithm in multimodal learning by simultaneously optimizing the classification and residual errors. Subsequently, we introduce an adaptive classifier assignment strategy to dynamically facilitate the classification performance of the weak modality. Furthermore, we theoretically analyze the convergence property of the cross-modal gap function, ensuring the effectiveness of the proposed boosting scheme. To this end, the classification ability of strong and weak modalities is expected to be balanced, thereby mitigating the imbalance issue. Empirical experiments on widely used datasets reveal the superiority of our method through comparison with various state-of-the-art (SOTA) multimodal learning baselines. The source code is available at https://github.com/njustkmg/NeurIPS25-AUG.

Paper Structure

This paper contains 25 sections, 3 theorems, 34 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Under some assumptions for the loss function and the effectiveness of sustained boosting algorithm, we have: where $\nu$, $\kappa$, $L_a$ and $\beta$ are constant.

Figures (7)

  • Figure 1: Comparison with naive MML, gradient boosting based MML (MML w/ GB), G-Blend OGR-GB:conf/cvpr/WangTF20, and Ours on CREMAD dataset. We find that enhancing the classification performance of the weak modality narrows the performance gap between the two modalities and improves overall performance.
  • Figure 2: Sensitivity to $\sigma$ (left) and $\lambda$ (right).
  • Figure 3: Performance comparison.
  • Figure 4: Visualization on CREMAD dataset. The video visualization highlights the need to improve weak modality classification.
  • Figure 5: Training time (hrs).
  • ...and 2 more figures

Theorems & Definitions (5)

  • Theorem 1: Convergence of Gap Loss, Informal
  • Lemma 1: Gap Bound
  • Proof 1: Proof of Lemma \ref{['lemma:gradient-norm']}
  • Theorem 1: Convergence of ${\mathcal{G}}$ with Gradient Boosting
  • Proof 2: Proof of Theorem \ref{['thm:cg-rate']}