Injecting Imbalance Sensitivity for Multi-Task Learning
Zhipeng Zhou, Liu Liu, Peilin Zhao, Wei Gong
TL;DR
This work argues that imbalance among tasks, i.e., dominance by some tasks, can be more detrimental to optimization in multi-task learning than gradient conflict alone. It proposes IMGrad, an imbalance-sensitive gradient method that injects imbalance-sensitivity via a projected-norm constraint, effectively decoupling Pareto quality from convergence. The approach augments existing gradient-manipulation frameworks (e.g., CAGrad, Nash-MTL) by emphasizing projected-gradient norms and adaptive weighting to mitigate Pareto failures and unbalanced progress across tasks. Extensive experiments across supervised benchmarks (NYUv2, CityScapes, CelebA) and reinforcement learning (MT10) show IMGrad achieving competitive or state-of-the-art Delta $m\%$ figures and robust improvements over baselines. The results highlight the practical impact of explicitly addressing imbalance in optimization-based MTL, pointing to imbalance-sensitivity as a key direction for future methods.
Abstract
Multi-task learning (MTL) has emerged as a promising approach for deploying deep learning models in real-life applications. Recent studies have proposed optimization-based learning paradigms to establish task-shared representations in MTL. However, our paper empirically argues that these studies, specifically gradient-based ones, primarily emphasize the conflict issue while neglecting the potentially more significant impact of imbalance/dominance in MTL. In line with this perspective, we enhance the existing baseline method by injecting imbalance-sensitivity through the imposition of constraints on the projected norms. To demonstrate the effectiveness of our proposed IMbalance-sensitive Gradient (IMGrad) descent method, we evaluate it on multiple mainstream MTL benchmarks, encompassing supervised learning tasks as well as reinforcement learning. The experimental results consistently demonstrate competitive performance.
