Table of Contents
Fetching ...

Quantifying Task Priority for Multi-Task Optimization

Wooseong Jeong, Kuk-Jin Yoon

TL;DR

This work addresses negative transfer in multi-task learning by reframing parameter updates in terms of task priority and connection strength. It introduces a two-phase optimization: Phase 1 learns per-task priorities by updating shared parameters sequentially through task-specific connections, aiming to discover new Pareto-optimal solutions; Phase 2 preserves these priorities by computing a normalized connection-strength measure and projecting gradients to align with the top-priority task per channel. The authors prove that incorporating task priority expands the Pareto frontier and provide convergence arguments, and they validate the approach across NYUD-v2, PASCAL-Context, and Cityscapes, showing superior multi-task performance over gradient-manipulation baselines under various loss-scaling schemes. The method achieves robustness across architectures with minimal parameter overhead and demonstrates practical impact for complex, multi-task vision systems.

Abstract

The goal of multi-task learning is to learn diverse tasks within a single unified network. As each task has its own unique objective function, conflicts emerge during training, resulting in negative transfer among them. Earlier research identified these conflicting gradients in shared parameters between tasks and attempted to realign them in the same direction. However, we prove that such optimization strategies lead to sub-optimal Pareto solutions due to their inability to accurately determine the individual contributions of each parameter across various tasks. In this paper, we propose the concept of task priority to evaluate parameter contributions across different tasks. To learn task priority, we identify the type of connections related to links between parameters influenced by task-specific losses during backpropagation. The strength of connections is gauged by the magnitude of parameters to determine task priority. Based on these, we present a new method named connection strength-based optimization for multi-task learning which consists of two phases. The first phase learns the task priority within the network, while the second phase modifies the gradients while upholding this priority. This ultimately leads to finding new Pareto optimal solutions for multiple tasks. Through extensive experiments, we show that our approach greatly enhances multi-task performance in comparison to earlier gradient manipulation methods.

Quantifying Task Priority for Multi-Task Optimization

TL;DR

This work addresses negative transfer in multi-task learning by reframing parameter updates in terms of task priority and connection strength. It introduces a two-phase optimization: Phase 1 learns per-task priorities by updating shared parameters sequentially through task-specific connections, aiming to discover new Pareto-optimal solutions; Phase 2 preserves these priorities by computing a normalized connection-strength measure and projecting gradients to align with the top-priority task per channel. The authors prove that incorporating task priority expands the Pareto frontier and provide convergence arguments, and they validate the approach across NYUD-v2, PASCAL-Context, and Cityscapes, showing superior multi-task performance over gradient-manipulation baselines under various loss-scaling schemes. The method achieves robustness across architectures with minimal parameter overhead and demonstrates practical impact for complex, multi-task vision systems.

Abstract

The goal of multi-task learning is to learn diverse tasks within a single unified network. As each task has its own unique objective function, conflicts emerge during training, resulting in negative transfer among them. Earlier research identified these conflicting gradients in shared parameters between tasks and attempted to realign them in the same direction. However, we prove that such optimization strategies lead to sub-optimal Pareto solutions due to their inability to accurately determine the individual contributions of each parameter across various tasks. In this paper, we propose the concept of task priority to evaluate parameter contributions across different tasks. To learn task priority, we identify the type of connections related to links between parameters influenced by task-specific losses during backpropagation. The strength of connections is gauged by the magnitude of parameters to determine task priority. Based on these, we present a new method named connection strength-based optimization for multi-task learning which consists of two phases. The first phase learns the task priority within the network, while the second phase modifies the gradients while upholding this priority. This ultimately leads to finding new Pareto optimal solutions for multiple tasks. Through extensive experiments, we show that our approach greatly enhances multi-task performance in comparison to earlier gradient manipulation methods.
Paper Structure (30 sections, 6 theorems, 50 equations, 6 figures, 16 tables, 1 algorithm)

This paper contains 30 sections, 6 theorems, 50 equations, 6 figures, 16 tables, 1 algorithm.

Key Result

Theorem 1

Updating gradients based on task priority for shared parameters $\Theta_s$ (update $g_i$ for each $\theta_{s,i}$) results in a smaller multi-task loss $\sum_{i=1}^{\mathcal{K}} w_i \mathcal{L}_i$ compared to updating the weighted summation of task-specific gradients $\sum_{i=1}^{\mathcal{K}} \nabla

Figures (6)

  • Figure 1: Overview of our connection strength-based optimization. (a) Previous methods RN36RN20RN18senushkin2023independent modify gradients in shared parameters to converge toward an intermediate direction without considering the task priority, which leads to sub-optimal Pareto solutions. (b) Our method divides the optimization process into two distinct phases. In Phase 1, task priority is learned through task-specific connections, leading to the identification of a new Pareto optimal solution. In Phase 2, task priority is gauged using the connection strength between shared and task-specific nodes. Subsequently, gradients in shared parameters are aligned with the direction of the highest-priority task's gradients. This phase ensures that priorities established in Phase 1 are maintained, thus reducing potential negative transfer.
  • Figure 2: The comparison of training losses on the NYUDv2 and PASCAL-Context. Ours find a new Pareto optimal solution for multiple tasks.
  • Figure 3: Correlation of loss trends across tasks during the epochs. a) Phase 1, b) Phase 2.
  • Figure 4: Visualization of the percentage of top-priority tasks over training epoch. a) Phase 1, b) Mixing Phase 1 and Phase 2
  • Figure 5: Visualization of the percentage of top-priority tasks over training epoch depending on the position in the network. We randomly selected several convolutional layers from the Network. The timing at which task priority stabilizes varies depending on the position of the convolutional layer.
  • ...and 1 more figures

Theorems & Definitions (16)

  • Definition 1: Pareto optimality
  • Definition 2: Conflicting gradients
  • Definition 3: Task priority
  • Theorem 1
  • Definition 4: Task-specific connection
  • Theorem 1
  • proof
  • Definition 5: Pareto stationarity
  • Theorem 2: Convergence of Phase 1
  • proof
  • ...and 6 more