DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
Chengxuan Qian, Kai Han, Jiaxin Liu, Zhenlong Yuan, Zhengzhong Zhu, Jingchao Wang, Chongwen Lyu, Jun Chen, Zhe Liu
TL;DR
DynCIM tackles modality and sample imbalances in multimodal learning by introducing a dual curriculum: a sample-level difficulty assessment based on prediction deviation, consistency, and stability, and a modality-level curriculum using global (Geometric Mean Ratio) and local (Harmonic Mean Improvement Rate) measures. A gating-based dynamic fusion mechanism then adapts modality contributions in real time, guided by an adaptive balance between overall fusion effectiveness and individual modality optimization. The framework optimizes a joint curriculum objective that reweights informative samples and fusion signals, and extensive experiments on six benchmarks show consistent improvements over state-of-the-art methods with competitive computational efficiency. This approach enhances inter-modal cooperation, robustness to noise, and convergence speed, making multimodal models more scalable and reliable in heterogeneous data regimes.
Abstract
Multimodal learning integrates complementary information from diverse modalities to enhance the decision-making process. However, the potential of multimodal collaboration remains under-exploited due to disparities in data quality and modality representation capabilities. To address this, we introduce DynCIM, a novel dynamic curriculum learning framework designed to quantify the inherent imbalances from both sample and modality perspectives. DynCIM employs a sample-level curriculum to dynamically assess each sample's difficulty according to prediction deviation, consistency, and stability, while a modality-level curriculum measures modality contributions from global and local. Furthermore, a gating-based dynamic fusion mechanism is introduced to adaptively adjust modality contributions, minimizing redundancy and optimizing fusion effectiveness. Extensive experiments on six multimodal benchmarking datasets, spanning both bimodal and trimodal scenarios, demonstrate that DynCIM consistently outperforms state-of-the-art methods. Our approach effectively mitigates modality and sample imbalances while enhancing adaptability and robustness in multimodal learning tasks. Our code is available at https://github.com/Raymond-Qiancx/DynCIM.
