Dual-Balancing for Multi-Task Learning
Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu, Ivor W. Tsang, James T. Kwok
TL;DR
DB-MTL tackles the persistent problem of task imbalance in multi-task learning by balancing both loss scales and gradient magnitudes. It introduces a parameter-free log transformation to equalize task losses and a maximum-norm gradient normalization (with EMA gradient estimates) to harmonize update magnitudes across tasks. Across diverse benchmarks (scene understanding, molecular property prediction, and image classification), DB-MTL consistently outperforms state-of-the-art baselines and, in many cases, matches or approaches STL on harder tasks, while also enabling effective combinations with other gradient-balancing methods. The approach improves training stability and reduces gradient conflicts, suggesting substantial practical impact for robust, scalable MTL in real-world settings; future work includes gradient variance considerations and theoretical convergence analysis.
Abstract
Multi-task learning aims to learn multiple related tasks simultaneously and has achieved great success in various fields. However, the disparity in loss and gradient scales among tasks often leads to performance compromises, and the balancing of tasks remains a significant challenge. In this paper, we propose Dual-Balancing Multi-Task Learning (DB-MTL) to achieve task balancing from both the loss and gradient perspectives. Specifically, DB-MTL achieves loss-scale balancing by performing logarithm transformation on each task loss, and rescales gradient magnitudes by normalizing all task gradients to comparable magnitudes using the maximum gradient norm. Extensive experiments on a number of benchmark datasets demonstrate that DB-MTL consistently performs better than the current state-of-the-art.
