Table of Contents
Fetching ...

Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone

Negar Hassanpour, Muhammad Kamran Janjua, Kunlin Zhang, Sepehr Lavasani, Xiaowen Zhang, Chunhua Zhou, Chao Gao

TL;DR

This work tackles gradient conflicts in multi-task learning by introducing ConicGrad, a gradient update within a cone around the reference gradient $g_0$ defined by the average task gradient. It formulates a constrained max-min objective, yields a closed-form update via duality and an efficient Sherman–Morrison–Woodbury-based computation for $d^{*}$, and decouples direction from magnitude through normalization. The authors establish convergence guarantees under standard Lipschitz assumptions and demonstrate, across toy, supervised, and reinforcement learning benchmarks, that ConicGrad often achieves state-of-the-art or competitive performance with favorable scalability and stability. The method offers practical benefits for high-dimensional models and diverse task sets, with future work focusing on dynamically adapting the cone angle $c$ during training.

Abstract

Balancing competing objectives remains a fundamental challenge in multi-task learning (MTL), primarily due to conflicting gradients across individual tasks. A common solution relies on computing a dynamic gradient update vector that balances competing tasks as optimization progresses. Building on this idea, we propose ConicGrad, a principled, scalable, and robust MTL approach formulated as a constrained optimization problem. Our method introduces an angular constraint to dynamically regulate gradient update directions, confining them within a cone centered on the reference gradient of the overall objective. By balancing task-specific gradients without over-constraining their direction or magnitude, ConicGrad effectively resolves inter-task gradient conflicts. Moreover, our framework ensures computational efficiency and scalability to high-dimensional parameter spaces. We conduct extensive experiments on standard supervised learning and reinforcement learning MTL benchmarks, and demonstrate that ConicGrad achieves state-of-the-art performance across diverse tasks.

Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone

TL;DR

This work tackles gradient conflicts in multi-task learning by introducing ConicGrad, a gradient update within a cone around the reference gradient defined by the average task gradient. It formulates a constrained max-min objective, yields a closed-form update via duality and an efficient Sherman–Morrison–Woodbury-based computation for , and decouples direction from magnitude through normalization. The authors establish convergence guarantees under standard Lipschitz assumptions and demonstrate, across toy, supervised, and reinforcement learning benchmarks, that ConicGrad often achieves state-of-the-art or competitive performance with favorable scalability and stability. The method offers practical benefits for high-dimensional models and diverse task sets, with future work focusing on dynamically adapting the cone angle during training.

Abstract

Balancing competing objectives remains a fundamental challenge in multi-task learning (MTL), primarily due to conflicting gradients across individual tasks. A common solution relies on computing a dynamic gradient update vector that balances competing tasks as optimization progresses. Building on this idea, we propose ConicGrad, a principled, scalable, and robust MTL approach formulated as a constrained optimization problem. Our method introduces an angular constraint to dynamically regulate gradient update directions, confining them within a cone centered on the reference gradient of the overall objective. By balancing task-specific gradients without over-constraining their direction or magnitude, ConicGrad effectively resolves inter-task gradient conflicts. Moreover, our framework ensures computational efficiency and scalability to high-dimensional parameter spaces. We conduct extensive experiments on standard supervised learning and reinforcement learning MTL benchmarks, and demonstrate that ConicGrad achieves state-of-the-art performance across diverse tasks.

Paper Structure

This paper contains 30 sections, 6 theorems, 42 equations, 7 figures, 5 tables, 1 algorithm.

Key Result

Proposition 3.1

Given the optimization problem in eq:org_eq, its Lagrangian in eq:primal, and assuming the Slater condition holds, the dual of the primal problem in eq:obj, the optimal gradient update $d^{*}$ is given by where $\mathbb{I}$ is a $M \times M$ identity matrix.

Figures (7)

  • Figure 1: Toy Experiment. The four plots on the left-side visualize the loss trajectories of various MTL methods from 5 initialization points ($\bullet$) on a toy 2-task learning problem (see \ref{['sec:toyres']} and \ref{['app:toy_detail']} for more details). Trajectories transition from blue to green, indicating progress over time. All 5 initialization points for FAMO reach the Pareto front (gray curve), while and 3 for NashMTL and all 5 for both CAGrad and ConicGrad reach the global minima ($\bigstar$) with ConicGrad converging significantly faster. The plot on the far-right compares the convergence speeds over training steps, showing that ConicGrad achieves the lowest loss (dashed black line) faster than all competing methods.
  • Figure 2: Visual Illustration of Update Vectors. Inspired by liu2021conflict, we illustrate the update vector $d$ (in red) for a two-task learning problem using various gradient descent methods: GD, MGDA, PCGrad, CAGrad, and ConicGrad. Task-specific gradients $g_1$ and $g_2$ are in black and the reference objective gradient $g_0$ is in blue. PCGrad projects each gradient onto the plane orthogonal to the other (dashed arrows) and averages the projections. CAGrad determines $d$ by maximizing the minimum improvement across both tasks within a constrained region around the reference gradient $g_0$. ConicGrad determines $d$ by constraining the update direction to lie within a cone centered around $g_0$ with an angle at most $\varphi=\arccos(c)$, ensuring alignment while allowing more flexibility.
  • Figure 3: Visualizing Conic vs. Directional Constraints. We visualize ConicGrad and CAGrad liu2021conflict constraints in a toy setup. The x and y axes denote all possible direction vectors in $2$D space $\mathbb{R}^{2}$, and the plot indicates which vectors in this space satisfy ConicGrad and CAGrad constraints.
  • Figure 4: Scalability Experiments on CelebA. We measure the computational overhead of MTL methods as the model size increases (in terms of number of parameters) to illustrate how these methods scale.
  • Figure 5: Contour Plots of $c$ and $\gamma$ on three MTL benchmarks. We ablate the hyperparameters $\gamma \in [0.001, 0.01]$ on the x-axes and $c \in \{0.1, 0.25, 0.5, 0.75, 0.9\}$ on the y-axes. The raw data consists of discrete values for $\gamma$ and $c$ at specific points, and we use interpolation to fill in the gaps to create a continuous surface that reveals how $\Delta m\%$ (darker areas indicate better performance) varies across the hyperparameter space.
  • ...and 2 more figures

Theorems & Definitions (13)

  • Proposition 3.1
  • proof
  • Theorem 3.2
  • proof
  • Proposition 1.1
  • proof
  • Theorem 1.2
  • proof
  • Proposition 1.3
  • proof
  • ...and 3 more