Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise

Zijing Xu; Yunfeng Kou; Kunming Wu; Hong Liu

Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise

Zijing Xu, Yunfeng Kou, Kunming Wu, Hong Liu

TL;DR

CAL introduces a contribution-guided, asymmetric learning framework for robust multimodal fusion under imbalance and noise. It jointly exploits a Shapley-inspired modality contribution metric and an asymmetric gradient modulation with a dynamic Softmax-based weighting, plus an asymmetric information bottleneck to compress noise while preserving task-relevant signals. The method achieves state-of-the-art results on five benchmarks and demonstrates strong robustness to various noise attacks, supported by comprehensive ablations and visualizations. CAL offers a modular, transferable approach with practical impact for real-world multimodal systems where modality value differs and data quality varies.

Abstract

Multimodal learning faces two major challenges: modality imbalance and data noise, which significantly affect the robustness and generalization ability of models. Existing methods achieve modality balance by suppressing dominant modalities, but they neglect the inherent differences in the information value between modalities, potentially leading to convergence to suboptimal solutions. This paper proposes an innovative modality compression paradigm, Contribution-Guided Asymmetric Learning (CAL), which aims to enhance the contribution of high-contribution modalities while compressing weak modalities to increase their contribution, allowing both to improve the performance of multimodal information fusion. CAL is based on a modality contribution metric W^m combining the information quantity I(m) and confidence D(m), and it designs an asymmetric gradient acceleration mechanism and a contribution-aware Asymmetric Information Bottleneck (AIB) compression mechanism. The former accelerates the gradient update of modalities, while the latter dynamically compresses the noise of low-contribution modalities. On five benchmark datasets, including emotion recognition, scene recognition, and event localization tasks, CAL has shown outstanding performance in imbalanced fusion tasks and noise robustness tests. On CREMA-D, KS, and AVE, CAL achieves 79.30%, 74.82%, and 74.21% accuracy, significantly outperforming the existing state-of-the-art model ARL. In high-noise robustness tests, CAL also achieved leading performance under various attack strategies on the MVSA-Single and NYUD2 datasets. These results validate the significant advantages of CAL in modality imbalance and noise interference. CAL, as a flexible and efficient framework, is easy to transfer to other tasks and has broad adaptability and potential application prospects.

Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise

TL;DR

Abstract

Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)