Adapting Auxiliary Losses Using Gradient Similarity
Yunshu Du, Wojciech M. Czarnecki, Siddhant M. Jayakumar, Mehrdad Farajtabar, Razvan Pascanu, Balaji Lakshminarayanan
TL;DR
The paper addresses data inefficiency by using auxiliary losses to boost main-task learning, but auxiliary tasks can hinder progress. It introduces gradient cosine similarity as an adaptive mechanism to gate auxiliary updates, ensuring the main loss converges to a local minimum while enabling positive transfer when aligned. The approach is validated across diverse domains—ImageNet-classification pairings, rotated MNIST, gridworld RL, and Atari games—showing it can detect and block negative transfer and, in many cases, accelerate learning. This strategy reduces the need for hand-tuned weighting of auxiliary losses and enhances practical data efficiency in both supervised and reinforcement learning settings.
Abstract
One approach to deal with the statistical inefficiency of neural networks is to rely on auxiliary losses that help to build useful representations. However, it is not always trivial to know if an auxiliary task will be helpful for the main task and when it could start hurting. We propose to use the cosine similarity between gradients of tasks as an adaptive weight to detect when an auxiliary loss is helpful to the main loss. We show that our approach is guaranteed to converge to critical points of the main task and demonstrate the practical usefulness of the proposed algorithm in a few domains: multi-task supervised learning on subsets of ImageNet, reinforcement learning on gridworld, and reinforcement learning on Atari games.
