Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization
Heshan Fernando, Parikshit Ram, Yi Zhou, Soham Dan, Horst Samulowitz, Nathalie Baracaldo, Tianyi Chen
TL;DR
The paper tackles modality imbalance in multi-modal learning by reframing MML as a lexicographic multi-objective optimization problem that prioritizes the worst-performing uni-modal objective. It introduces MIMO, a gradient-based solver that optimizes a smoothly penalized objective combining the multi-modal loss and modality-specific losses, with a smoothing term that approximates the max over modalities. The authors prove convergence guarantees for the proposed method and demonstrate superior performance and up to ~20x speedups on diverse benchmarks compared to existing balanced MML and MOO baselines. This approach enhances generalization by preventing dominance of fast-learning modalities and is adaptable to various multi-modal settings, with code available for replication. The work points to promising extensions to early or hybrid fusion paradigms to broaden applicability.
Abstract
Multi-modal learning (MML) aims to integrate information from multiple modalities, which is expected to lead to superior performance over single-modality learning. However, recent studies have shown that MML can underperform, even compared to single-modality approaches, due to imbalanced learning across modalities. Methods have been proposed to alleviate this imbalance issue using different heuristics, which often lead to computationally intensive subroutines. In this paper, we reformulate the MML problem as a multi-objective optimization (MOO) problem that overcomes the imbalanced learning issue among modalities and propose a gradient-based algorithm to solve the modified MML problem. We provide convergence guarantees for the proposed method, and empirical evaluations on popular MML benchmarks showcasing the improved performance of the proposed method over existing balanced MML and MOO baselines, with up to ~20x reduction in subroutine computation time. Our code is available at https://github.com/heshandevaka/MIMO.
