Table of Contents
Fetching ...

Graph Coloring for Multi-Task Learning

Santosh Patapati

TL;DR

Gradient interference in multitask learning slows convergence and degrades performance when tasks pull updates in conflicting directions. SON-GOKU addresses this by online estimation of cross-task gradient interference, constructing a sparse conflict graph, and greedily coloring to form low-conflict task groups updated sequentially; the graph is refreshed periodically to track evolving relationships. The approach offers descent guarantees, preserves the standard nonconvex SGD rate up to a small factor, and can recover population-level task partitions with high probability. Empirically, SON-GOKU yields consistent improvements across six datasets, often enhancing compatibility with existing MTL optimizers like AdaTask and PCGrad while maintaining scalable time and memory, making interference-aware scheduling practically appealing for diverse multitask settings.

Abstract

When different objectives conflict with each other in multi-task learning, gradients begin to interfere and slow convergence, thereby potentially reducing the final model's performance. To address this, we introduce SON-GOKU, a scheduler that computes gradient interference, constructs an interference graph, and then applies greedy graph-coloring to partition tasks into groups that align well with each other. At each training step, only one group (color class) of tasks are activated, and the grouping partition is constantly recomputed as task relationships evolve throughout training. By ensuring that each mini-batch contains only tasks that pull the model in the same direction, our method improves the effectiveness of any underlying multi-task learning optimizer without additional tuning. Since tasks within these groups will update in compatible directions, multi-task learning will improve model performance rather than impede it. Empirical results on six different datasets show that this interference-aware graph-coloring approach consistently outperforms baselines and state-of-the-art multi-task optimizers. We provide extensive theory showing why grouping and sequential updates improve multi-task learning, with guarantees on descent, convergence, and accurately identifying what tasks conflict or align.

Graph Coloring for Multi-Task Learning

TL;DR

Gradient interference in multitask learning slows convergence and degrades performance when tasks pull updates in conflicting directions. SON-GOKU addresses this by online estimation of cross-task gradient interference, constructing a sparse conflict graph, and greedily coloring to form low-conflict task groups updated sequentially; the graph is refreshed periodically to track evolving relationships. The approach offers descent guarantees, preserves the standard nonconvex SGD rate up to a small factor, and can recover population-level task partitions with high probability. Empirically, SON-GOKU yields consistent improvements across six datasets, often enhancing compatibility with existing MTL optimizers like AdaTask and PCGrad while maintaining scalable time and memory, making interference-aware scheduling practically appealing for diverse multitask settings.

Abstract

When different objectives conflict with each other in multi-task learning, gradients begin to interfere and slow convergence, thereby potentially reducing the final model's performance. To address this, we introduce SON-GOKU, a scheduler that computes gradient interference, constructs an interference graph, and then applies greedy graph-coloring to partition tasks into groups that align well with each other. At each training step, only one group (color class) of tasks are activated, and the grouping partition is constantly recomputed as task relationships evolve throughout training. By ensuring that each mini-batch contains only tasks that pull the model in the same direction, our method improves the effectiveness of any underlying multi-task learning optimizer without additional tuning. Since tasks within these groups will update in compatible directions, multi-task learning will improve model performance rather than impede it. Empirical results on six different datasets show that this interference-aware graph-coloring approach consistently outperforms baselines and state-of-the-art multi-task optimizers. We provide extensive theory showing why grouping and sequential updates improve multi-task learning, with guarantees on descent, convergence, and accurately identifying what tasks conflict or align.

Paper Structure

This paper contains 161 sections, 22 theorems, 126 equations, 16 figures, 11 tables, 1 algorithm.

Key Result

Proposition 1

If $G^\star$ is complete $m$-partite with parts $\{P_r\}_{r=1}^m$, then $\chi(G^\star)=m$.

Figures (16)

  • Figure 1: Interference-aware scheduling pipeline: (a) For each task $T_i$ (circles $T_1\ldots T_6$), we smooth recent per-step gradients with an Exponential Moving Average (EMA); (b) From these EMA vectors we compute the pairwise cosine matrix. In the figure, cells outlined with red dashes mark pairs with cosine $<-\tau$. These are flagged as conflicts; (c) We build the conflict graph whose nodes are tasks $T_i$ and whose red dashed edges connect exactly those pairs identified in (b); (d) We apply greedy graph coloring so that no conflict edge lies within a color, producing low-conflict groups. In the example shown, we have two groups: A as blue and B as orange; (e) During training we activate one group per step. After every $R$ steps (here, $R=4$) we 'refresh' and run the pipeline again from step A, where we update the EMAs with the latest gradients.
  • Figure 2: Evaluation of SON-GOKU's speed with varying $R$ (8, 16, 32, 64) on different backbone widths. Plotted against the joint training baseline ($R=\infty$). Highlighted regions represent standard deviation from 15 separate trials. This data was collected during training on Taskonomy Tiny subset.
  • Figure 3: How the fraction of tasks assigned to each group size evolves over the refresh steps in training. This is a stacked area plot showing how the proportion of tasks in each group size $\|G\|$ evolves during training.
  • Figure 4: Conflict sparsity during training. Subfigure (a) plots the median average node degree (magenta line) with its 10th--90th percentile band (magenta). Subfigure (b) shows the median edge density of the task conflict graph at each refresh step (blue line) with the 10th--90th percentile range across runs (blue band)..
  • Figure 5: Grouping behavior throughout training. The blue line represents the number of active color groups at each training step. The orange line represents the median number of groups observed during each refresh period, with the shading showing the full range for that period. Subfigure (a) shows more details from step 0 to step 5,000 in the training process, and Subfigure (b) shows the data from step 0 to step 80,000. Both plots have been lightly smoothed based on moving medians to make them easier to interpret.
  • ...and 11 more figures

Theorems & Definitions (41)

  • Definition B.1
  • Definition B.2
  • Proposition 1: Chromatic number of a complete multipartite graph
  • proof
  • Theorem 1: Identifiability via optimal coloring under model (B)
  • proof
  • Proposition 2: Identifiability via components under model (A)
  • Lemma 1: EMA vector concentration in directions of interest
  • proof
  • Lemma 2: Cosine stability under perturbations
  • ...and 31 more