Table of Contents
Fetching ...

Beyond Forced Modality Balance: Intrinsic Information Budgets for Multimodal Learning

Zechang Xiong, Da Li, Kexin Tang, Pengyuan Li, Wenkang Kong, Yulan Hu

Abstract

Multimodal models often converge to a dominant-modality solution, in which a stronger, faster-converging modality overshadows weaker ones. This modality imbalance causes suboptimal performance. Existing methods attempt to balance different modalities by reweighting gradients or losses. However, they overlook the fact that each modality has finite information capacity. In this work, we propose IIBalance, a multimodal learning framework that aligns the modality contributions with Intrinsic Information Budgets (IIB). We propose a task-grounded estimator of each modality's IIB, transforming its capacity into a global prior over modality contributions. Anchored by the highest-budget modality, we design a prototype-based relative alignment mechanism that corrects semantic drift only when weaker modalities deviate from their budgeted potential, rather than forcing imitation. During inference, we propose a probabilistic gating module that integrates the global budgets with sample-level uncertainty to generate calibrated fusion weights. Experiments on three representative benchmarks demonstrate that IIBalance consistently outperforms state-of-the-art balancing methods and achieves better utilization of complementary modality cues. Our code is available at: https://github.com/XiongZechang/IIBalance.

Beyond Forced Modality Balance: Intrinsic Information Budgets for Multimodal Learning

Abstract

Multimodal models often converge to a dominant-modality solution, in which a stronger, faster-converging modality overshadows weaker ones. This modality imbalance causes suboptimal performance. Existing methods attempt to balance different modalities by reweighting gradients or losses. However, they overlook the fact that each modality has finite information capacity. In this work, we propose IIBalance, a multimodal learning framework that aligns the modality contributions with Intrinsic Information Budgets (IIB). We propose a task-grounded estimator of each modality's IIB, transforming its capacity into a global prior over modality contributions. Anchored by the highest-budget modality, we design a prototype-based relative alignment mechanism that corrects semantic drift only when weaker modalities deviate from their budgeted potential, rather than forcing imitation. During inference, we propose a probabilistic gating module that integrates the global budgets with sample-level uncertainty to generate calibrated fusion weights. Experiments on three representative benchmarks demonstrate that IIBalance consistently outperforms state-of-the-art balancing methods and achieves better utilization of complementary modality cues. Our code is available at: https://github.com/XiongZechang/IIBalance.
Paper Structure (28 sections, 13 equations, 4 figures, 2 tables)

This paper contains 28 sections, 13 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Absolute vs. relative modality balancing.Left: absolute balance enforces equal contribution across modalities, which can under-utilize high-capacity modalities while pushing low-capacity modalities to overfit residual noise. Right: Relative balance allocates each modality’s contribution in accordance with its Intrinsic Information Budget (IIB), encouraging capacity-aware utilization.
  • Figure 2: Overview of IIBalance. An intrinsic information budget prior is estimated from unimodal prediction entropy. Stage I learns unimodal features with prototype-guided relative alignment. Stage II conducts uncertainty-aware Bayesian fusion using entropy-conditioned gating. Here, $\otimes$ denotes element-wise multiplication and $\oplus$ denotes element-wise addition.
  • Figure 3: Comparison between the IIB prior $\beta_m$ and the averaged fusion weights on Kinetics-Sounds, CREMA-D, and AVE.
  • Figure 4: Hyperparameter sensitivity of IIBalance.