Gradient Harmonization in Unsupervised Domain Adaptation
Fuxiang Huang, Suqi Song, Lei Zhang
TL;DR
This work addresses gradient conflicts in unsupervised domain adaptation, where domain alignment and classification losses can push gradients in opposing directions. It introduces Gradient Harmonization (GH) and GH++, which reorient task gradients from obtuse to acute or orthogonal, and derives an equivalent dynamic loss (UDA+GH/GH++) that reweights the two losses based on gradient geometry. The authors provide theoretical derivations, visualize gradient behavior, and demonstrate consistent improvements across multiple benchmarks (Office-31, Office-Home, VisDA-2017, Digits, DomainNet) and backbones, with GH++ often delivering larger gains. The approaches are plug-and-play, orthogonal to existing UDA methods, and extendible to other multi-task problems like object detection and multi-modal retrieval, offering a practical route to more stable and effective cross-domain learning.
Abstract
Unsupervised domain adaptation (UDA) intends to transfer knowledge from a labeled source domain to an unlabeled target domain. Many current methods focus on learning feature representations that are both discriminative for classification and invariant across domains by simultaneously optimizing domain alignment and classification tasks. However, these methods often overlook a crucial challenge: the inherent conflict between these two tasks during gradient-based optimization. In this paper, we delve into this issue and introduce two effective solutions known as Gradient Harmonization, including GH and GH++, to mitigate the conflict between domain alignment and classification tasks. GH operates by altering the gradient angle between different tasks from an obtuse angle to an acute angle, thus resolving the conflict and trade-offing the two tasks in a coordinated manner. Yet, this would cause both tasks to deviate from their original optimization directions. We thus further propose an improved version, GH++, which adjusts the gradient angle between tasks from an obtuse angle to a vertical angle. This not only eliminates the conflict but also minimizes deviation from the original gradient directions. Finally, for optimization convenience and efficiency, we evolve the gradient harmonization strategies into a dynamically weighted loss function using an integral operator on the harmonized gradient. Notably, GH/GH++ are orthogonal to UDA and can be seamlessly integrated into most existing UDA models. Theoretical insights and experimental analyses demonstrate that the proposed approaches not only enhance popular UDA baselines but also improve recent state-of-the-art models.
