Table of Contents
Fetching ...

Injecting Imbalance Sensitivity for Multi-Task Learning

Zhipeng Zhou, Liu Liu, Peilin Zhao, Wei Gong

TL;DR

This work argues that imbalance among tasks, i.e., dominance by some tasks, can be more detrimental to optimization in multi-task learning than gradient conflict alone. It proposes IMGrad, an imbalance-sensitive gradient method that injects imbalance-sensitivity via a projected-norm constraint, effectively decoupling Pareto quality from convergence. The approach augments existing gradient-manipulation frameworks (e.g., CAGrad, Nash-MTL) by emphasizing projected-gradient norms and adaptive weighting to mitigate Pareto failures and unbalanced progress across tasks. Extensive experiments across supervised benchmarks (NYUv2, CityScapes, CelebA) and reinforcement learning (MT10) show IMGrad achieving competitive or state-of-the-art Delta $m\%$ figures and robust improvements over baselines. The results highlight the practical impact of explicitly addressing imbalance in optimization-based MTL, pointing to imbalance-sensitivity as a key direction for future methods.

Abstract

Multi-task learning (MTL) has emerged as a promising approach for deploying deep learning models in real-life applications. Recent studies have proposed optimization-based learning paradigms to establish task-shared representations in MTL. However, our paper empirically argues that these studies, specifically gradient-based ones, primarily emphasize the conflict issue while neglecting the potentially more significant impact of imbalance/dominance in MTL. In line with this perspective, we enhance the existing baseline method by injecting imbalance-sensitivity through the imposition of constraints on the projected norms. To demonstrate the effectiveness of our proposed IMbalance-sensitive Gradient (IMGrad) descent method, we evaluate it on multiple mainstream MTL benchmarks, encompassing supervised learning tasks as well as reinforcement learning. The experimental results consistently demonstrate competitive performance.

Injecting Imbalance Sensitivity for Multi-Task Learning

TL;DR

This work argues that imbalance among tasks, i.e., dominance by some tasks, can be more detrimental to optimization in multi-task learning than gradient conflict alone. It proposes IMGrad, an imbalance-sensitive gradient method that injects imbalance-sensitivity via a projected-norm constraint, effectively decoupling Pareto quality from convergence. The approach augments existing gradient-manipulation frameworks (e.g., CAGrad, Nash-MTL) by emphasizing projected-gradient norms and adaptive weighting to mitigate Pareto failures and unbalanced progress across tasks. Extensive experiments across supervised benchmarks (NYUv2, CityScapes, CelebA) and reinforcement learning (MT10) show IMGrad achieving competitive or state-of-the-art Delta figures and robust improvements over baselines. The results highlight the practical impact of explicitly addressing imbalance in optimization-based MTL, pointing to imbalance-sensitivity as a key direction for future methods.

Abstract

Multi-task learning (MTL) has emerged as a promising approach for deploying deep learning models in real-life applications. Recent studies have proposed optimization-based learning paradigms to establish task-shared representations in MTL. However, our paper empirically argues that these studies, specifically gradient-based ones, primarily emphasize the conflict issue while neglecting the potentially more significant impact of imbalance/dominance in MTL. In line with this perspective, we enhance the existing baseline method by injecting imbalance-sensitivity through the imposition of constraints on the projected norms. To demonstrate the effectiveness of our proposed IMbalance-sensitive Gradient (IMGrad) descent method, we evaluate it on multiple mainstream MTL benchmarks, encompassing supervised learning tasks as well as reinforcement learning. The experimental results consistently demonstrate competitive performance.

Paper Structure

This paper contains 17 sections, 12 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of imbalance and conflicting issue in multi-task learning. 'Bal' and 'Imb' represent balanced and imbalanced, while 'N-Con' and 'Con' represent non-conflicting and conflicting.
  • Figure 2: Comparison of MTL approaches on the imbalanced synthetic two-task benchmark. $\bullet$ and $\star$ represent the starting point and global optimum, respectively, and grey line ${\color{gray}}$ represents the Pareto front. Two objectives are extremely imbalanced weighted, i.e., $(0.9*\mathcal{L}_1, 0.1*\mathcal{L}_2)$. Please refer to the Appendix for more optimization trajectories under various pre-defined task weights.
  • Figure 3: Statistical imbalance ratios of MTL approaches.
  • Figure 4: Comparison of MTL approaches on the toy examples. We use the tool provided CAGrad to generate the synthetic toy examples with two objective shown in (b) and (c). In this case, both objective are equally weighted.
  • Figure 5: Individual gradient similarity and progress analysis of MTL algorithms on CityScapes. (a)-(c) show the gradient similarities between individuals and the combined gradient; (d)-(e) present the progress of individuals during optimization.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1: Gradient Similarity
  • Definition 2: Imbalance of Individuals
  • Definition 3: Pareto Property