Table of Contents
Fetching ...

Towards Principled Task Grouping for Multi-Task Learning

Chenguang Wang, Xuanhao Pan, Tianshu Yu

TL;DR

This work tackles the challenge of negative and positive transfer in multi-task learning by introducing a principled task-grouping approach built on assumption-free transfer gains. It proposes a generic, budget-aware mixed-integer programming framework that assigns tasks to groups to maximize aggregated transfer gains while enforcing real-world constraints, and it provides complexity-aware strategies to reduce computation. Empirical results across Taskonomy, CelebA, COP, and ETTm1 demonstrate consistent gains over STL, traditional MTL, and existing grouping methods, with demonstrated robustness to group-size constraints and substantial efficiency improvements via sampling and lazy collection. The approach offers practical scalability and broad applicability to computer vision, combinatorial optimization, and time-series domains, marking a significant advance in principled, resource-aware MTL design.

Abstract

Multi-task learning (MTL) aims to leverage shared information among tasks to improve learning efficiency and accuracy. However, MTL often struggles to effectively manage positive and negative transfer between tasks, which can hinder performance improvements. Task grouping addresses this challenge by organizing tasks into meaningful clusters, maximizing beneficial transfer while minimizing detrimental interactions. This paper introduces a principled approach to task grouping in MTL, advancing beyond existing methods by addressing key theoretical and practical limitations. Unlike prior studies, our method offers a theoretically grounded approach that does not depend on restrictive assumptions for constructing transfer gains. We also present a flexible mathematical programming formulation that accommodates a wide range of resource constraints, thereby enhancing its versatility. Experimental results across diverse domains, including computer vision datasets, combinatorial optimization benchmarks, and time series tasks, demonstrate the superiority of our method over extensive baselines, thereby validating its effectiveness and general applicability in MTL without sacrificing efficiency.

Towards Principled Task Grouping for Multi-Task Learning

TL;DR

This work tackles the challenge of negative and positive transfer in multi-task learning by introducing a principled task-grouping approach built on assumption-free transfer gains. It proposes a generic, budget-aware mixed-integer programming framework that assigns tasks to groups to maximize aggregated transfer gains while enforcing real-world constraints, and it provides complexity-aware strategies to reduce computation. Empirical results across Taskonomy, CelebA, COP, and ETTm1 demonstrate consistent gains over STL, traditional MTL, and existing grouping methods, with demonstrated robustness to group-size constraints and substantial efficiency improvements via sampling and lazy collection. The approach offers practical scalability and broad applicability to computer vision, combinatorial optimization, and time-series domains, marking a significant advance in principled, resource-aware MTL design.

Abstract

Multi-task learning (MTL) aims to leverage shared information among tasks to improve learning efficiency and accuracy. However, MTL often struggles to effectively manage positive and negative transfer between tasks, which can hinder performance improvements. Task grouping addresses this challenge by organizing tasks into meaningful clusters, maximizing beneficial transfer while minimizing detrimental interactions. This paper introduces a principled approach to task grouping in MTL, advancing beyond existing methods by addressing key theoretical and practical limitations. Unlike prior studies, our method offers a theoretically grounded approach that does not depend on restrictive assumptions for constructing transfer gains. We also present a flexible mathematical programming formulation that accommodates a wide range of resource constraints, thereby enhancing its versatility. Experimental results across diverse domains, including computer vision datasets, combinatorial optimization benchmarks, and time series tasks, demonstrate the superiority of our method over extensive baselines, thereby validating its effectiveness and general applicability in MTL without sacrificing efficiency.
Paper Structure (31 sections, 1 theorem, 22 equations, 11 figures, 7 tables)

This paper contains 31 sections, 1 theorem, 22 equations, 11 figures, 7 tables.

Key Result

Proposition 1

Consider a multi-task learning setup with shared parameters $\phi \in \mathbb{R}^d$ and task-specific parameters $\theta_k$ for each task $T_k \in \mathcal{T}$. Let $L_k(\phi, \theta_k)$ be the loss function for task $T_k$. Suppose the model parameters are updated from $(\phi^t, \{\theta_k^t\}_{k \i Let $\mathcal{S}_{i \rightarrow j}^{t}$ and $\mathcal{S}_{A \rightarrow j}^{t}$ be the task transfe

Figures (11)

  • Figure 1: Performance demonstration across grouping methods on each dataset. This figure presents the results in loss reduction ($\uparrow$) for Taskonomy, total test error ($\downarrow$) for CelebA, total optimality gap ($\downarrow$) and total MAE ($\downarrow$) for ETTm1, segmented by various data splits. The dotted and dashed horizontal lines indicate the Single Task Learning (STL) and the best Multi-Task Learning (MTL) benchmarks.
  • Figure 1: Grouing and comparison results on the Taskonomy dataset.
  • Figure 2: Comparative Performance under Maximum and Minimum Group Size Constraints. The figure delineates the performance of our task grouping method against random policy. Metrics such as loss reduction $(\uparrow)$ for Taskonomy dataset, total error $(\downarrow)$ for the CelebA dataset, Total Gap $(\downarrow)$ for combinatorial optimization problems (COP), and Total Mean Absolute Error (MAE) $(\downarrow)$ for time series forecasting tasks are evaluated across a range of group sizes, illustrating the adaptability of our method to both maximal and minimal size constraints.
  • Figure 2: Grouing and comparison results on the CelebA dataset.
  • Figure 3: Average classification error for 2, 3, and 4-split task groupings for the subset of 9 tasks in CelebA, compared across various methods (Ours (S), Ours, TAG, CS, STL, MTL, UW, GN, PCGrad) versus GPU hours.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Definition 1: Multitask Learning
  • Definition 2
  • Definition 3
  • Proposition 1
  • Remark : Practical Implications of the Bound's Magnitude
  • proof