Towards Principled Task Grouping for Multi-Task Learning
Chenguang Wang, Xuanhao Pan, Tianshu Yu
TL;DR
This work tackles the challenge of negative and positive transfer in multi-task learning by introducing a principled task-grouping approach built on assumption-free transfer gains. It proposes a generic, budget-aware mixed-integer programming framework that assigns tasks to groups to maximize aggregated transfer gains while enforcing real-world constraints, and it provides complexity-aware strategies to reduce computation. Empirical results across Taskonomy, CelebA, COP, and ETTm1 demonstrate consistent gains over STL, traditional MTL, and existing grouping methods, with demonstrated robustness to group-size constraints and substantial efficiency improvements via sampling and lazy collection. The approach offers practical scalability and broad applicability to computer vision, combinatorial optimization, and time-series domains, marking a significant advance in principled, resource-aware MTL design.
Abstract
Multi-task learning (MTL) aims to leverage shared information among tasks to improve learning efficiency and accuracy. However, MTL often struggles to effectively manage positive and negative transfer between tasks, which can hinder performance improvements. Task grouping addresses this challenge by organizing tasks into meaningful clusters, maximizing beneficial transfer while minimizing detrimental interactions. This paper introduces a principled approach to task grouping in MTL, advancing beyond existing methods by addressing key theoretical and practical limitations. Unlike prior studies, our method offers a theoretically grounded approach that does not depend on restrictive assumptions for constructing transfer gains. We also present a flexible mathematical programming formulation that accommodates a wide range of resource constraints, thereby enhancing its versatility. Experimental results across diverse domains, including computer vision datasets, combinatorial optimization benchmarks, and time series tasks, demonstrate the superiority of our method over extensive baselines, thereby validating its effectiveness and general applicability in MTL without sacrificing efficiency.
