Gradient Deconfliction via Orthogonal Projections onto Subspaces For Multi-task Learning
Shijie Zhu, Hui Zhao, Tianshu Wu, Pengjie Wang, Hongbo Deng, Jian Xu, Bo Zheng
TL;DR
This paper addresses gradient conflicts in multi-task learning by introducing GradOPS, an orthogonal-projection method that enforces strong non-conflicting gradients. By projecting each task gradient $g_i$ onto the subspace orthogonal to the span of the others, GradOPS guarantees a final update $G'$ that does not conflict with any original task gradient, enabling simple, flexible trade-offs via a single hyperparameter $\alpha$ and reweighting with $w_i$. The authors provide convergence guarantees to Pareto stationary points and demonstrate state-of-the-art performance across diverse benchmarks, including multi-task classification, scene understanding, and large-scale recommendation, with the ability to discover multiple Pareto-optimal trade-offs. GradOPS also outperforms or matches existing MOO methods while being simpler and more robust to task order, suggesting strong non-conflicting gradients as a practical foundation for robust, versatile MTL. Overall, the work offers a scalable, principled approach to balancing competing tasks and enables practitioners to tailor trade-offs without extensive hyperparameter sweeps.
Abstract
Although multi-task learning (MTL) has been a preferred approach and successfully applied in many real-world scenarios, MTL models are not guaranteed to outperform single-task models on all tasks mainly due to the negative effects of conflicting gradients among the tasks. In this paper, we fully examine the influence of conflicting gradients and further emphasize the importance and advantages of achieving non-conflicting gradients which allows simple but effective trade-off strategies among the tasks with stable performance. Based on our findings, we propose the Gradient Deconfliction via Orthogonal Projections onto Subspaces (GradOPS) spanned by other task-specific gradients. Our method not only solves all conflicts among the tasks, but can also effectively search for diverse solutions towards different trade-off preferences among the tasks. Theoretical analysis on convergence is provided, and performance of our algorithm is fully testified on multiple benchmarks in various domains. Results demonstrate that our method can effectively find multiple state-of-the-art solutions with different trade-off strategies among the tasks on multiple datasets.
