Table of Contents
Fetching ...

Fair Resource Allocation in Multi-Task Learning

Hao Ban, Kaiyi Ji

TL;DR

This work addresses the challenge of conflicting gradients in multi-task learning by framing optimization as a utility-maximization problem with $ extit{$\

Abstract

By jointly learning multiple tasks, multi-task learning (MTL) can leverage the shared knowledge across tasks, resulting in improved data efficiency and generalization performance. However, a major challenge in MTL lies in the presence of conflicting gradients, which can hinder the fair optimization of some tasks and subsequently impede MTL's ability to achieve better overall performance. Inspired by fair resource allocation in communication networks, we formulate the optimization of MTL as a utility maximization problem, where the loss decreases across tasks are maximized under different fairness measurements. To solve this problem, we propose FairGrad, a novel MTL optimization method. FairGrad not only enables flexible emphasis on certain tasks but also achieves a theoretical convergence guarantee. Extensive experiments demonstrate that our method can achieve state-of-the-art performance among gradient manipulation methods on a suite of multi-task benchmarks in supervised learning and reinforcement learning. Furthermore, we incorporate the idea of $α$-fairness into loss functions of various MTL methods. Extensive empirical studies demonstrate that their performance can be significantly enhanced. Code is provided at \url{https://github.com/OptMN-Lab/fairgrad}.

Fair Resource Allocation in Multi-Task Learning

TL;DR

This work addresses the challenge of conflicting gradients in multi-task learning by framing optimization as a utility-maximization problem with \

Abstract

By jointly learning multiple tasks, multi-task learning (MTL) can leverage the shared knowledge across tasks, resulting in improved data efficiency and generalization performance. However, a major challenge in MTL lies in the presence of conflicting gradients, which can hinder the fair optimization of some tasks and subsequently impede MTL's ability to achieve better overall performance. Inspired by fair resource allocation in communication networks, we formulate the optimization of MTL as a utility maximization problem, where the loss decreases across tasks are maximized under different fairness measurements. To solve this problem, we propose FairGrad, a novel MTL optimization method. FairGrad not only enables flexible emphasis on certain tasks but also achieves a theoretical convergence guarantee. Extensive experiments demonstrate that our method can achieve state-of-the-art performance among gradient manipulation methods on a suite of multi-task benchmarks in supervised learning and reinforcement learning. Furthermore, we incorporate the idea of -fairness into loss functions of various MTL methods. Extensive empirical studies demonstrate that their performance can be significantly enhanced. Code is provided at \url{https://github.com/OptMN-Lab/fairgrad}.
Paper Structure (26 sections, 4 theorems, 34 equations, 1 figure, 11 tables, 1 algorithm)

This paper contains 26 sections, 4 theorems, 34 equations, 1 figure, 11 tables, 1 algorithm.

Key Result

Proposition 6.1

The Pareto front of the $\alpha$-fair loss functions in alpha_trans is the same as that of original loss functions $(l_1,...,l_K)$.

Figures (1)

  • Figure 1: An illustrative two-task example from navon2022multi to show the convergence of FairGrad to Pareto front from different initialization points (black dots $\bullet$). The optimization trajectories are colored from orange to purple. The bold gray line represents the Pareto front. The illustration showcases four fairness concepts (from left to right): simple average (i.e., Linear Scalarization (LS)), proportional fairness, minimum potential delay (MPD) fairness, and max-min fairness. It can be seen that LS is inclined towards the task $2$ with a larger gradient. FairGrad with proportional fairness resembles Nash-MTL navon2022multi, and can find more balanced solutions along the Pareto front. MPD fairness aims to minimize the overall time for all tasks to converge, and shifts slightly more attention to some struggling tasks with smaller gradients. Max-min fairness emphasizes more on the less-fortune task with a smaller gradient magnitude. Also, observe that our FairGrad ensures the convergence to the Pareto front from all different initialization points.

Theorems & Definitions (7)

  • Proposition 6.1
  • Theorem 7.3
  • proof : Proof skecth
  • Theorem 2.1: Restatement of \ref{['thm:alpha_fair_convergence']}
  • proof
  • Proposition 2.2: Restatement of \ref{['prop:alpha_fair_loss']}
  • proof