Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang, Yake Wei, Zequn Yang, Di Hu
TL;DR
This work identifies a prime learning window in multimodal training where information acquisition is imbalanced across modalities and shows that information-rich modalities can suppress others. It introduces Information Acquisition Regulation (InfoReg), which adaptively slows information flow for dominant modalities during this window using a Fisher Information-based metric and a per-batch regulation term. The method combines unimodal and multimodal losses with an adaptive coefficient alpha that depends on the observed performance gap, improving information uptake for information-scarce modalities and yielding higher overall accuracy. Across CREMA-D, Kinetics Sounds, and CMU-MOSI, InfoReg outperforms existing imbalanced methods and demonstrates robustness to fusion strategies and architectural settings, with the prime window shown to be essential for gains. The work provides practical implications for designing training schedules in multimodal systems and offers code for reproducibility.
Abstract
Sensory training during the early ages is vital for human development. Inspired by this cognitive phenomenon, we observe that the early training stage is also important for the multimodal learning process, where dataset information is rapidly acquired. We refer to this stage as the prime learning window. However, based on our observation, this prime learning window in multimodal learning is often dominated by information-sufficient modalities, which in turn suppresses the information acquisition of information-insufficient modalities. To address this issue, we propose Information Acquisition Regulation (InfoReg), a method designed to balance information acquisition among modalities. Specifically, InfoReg slows down the information acquisition process of information-sufficient modalities during the prime learning window, which could promote information acquisition of information-insufficient modalities. This regulation enables a more balanced learning process and improves the overall performance of the multimodal network. Experiments show that InfoReg outperforms related multimodal imbalanced methods across various datasets, achieving superior model performance. The code is available at https://github.com/GeWu-Lab/InfoReg_CVPR2025.
