Gradient Harmonization in Unsupervised Domain Adaptation

Fuxiang Huang; Suqi Song; Lei Zhang

Gradient Harmonization in Unsupervised Domain Adaptation

Fuxiang Huang, Suqi Song, Lei Zhang

TL;DR

This work addresses gradient conflicts in unsupervised domain adaptation, where domain alignment and classification losses can push gradients in opposing directions. It introduces Gradient Harmonization (GH) and GH++, which reorient task gradients from obtuse to acute or orthogonal, and derives an equivalent dynamic loss (UDA+GH/GH++) that reweights the two losses based on gradient geometry. The authors provide theoretical derivations, visualize gradient behavior, and demonstrate consistent improvements across multiple benchmarks (Office-31, Office-Home, VisDA-2017, Digits, DomainNet) and backbones, with GH++ often delivering larger gains. The approaches are plug-and-play, orthogonal to existing UDA methods, and extendible to other multi-task problems like object detection and multi-modal retrieval, offering a practical route to more stable and effective cross-domain learning.

Abstract

Unsupervised domain adaptation (UDA) intends to transfer knowledge from a labeled source domain to an unlabeled target domain. Many current methods focus on learning feature representations that are both discriminative for classification and invariant across domains by simultaneously optimizing domain alignment and classification tasks. However, these methods often overlook a crucial challenge: the inherent conflict between these two tasks during gradient-based optimization. In this paper, we delve into this issue and introduce two effective solutions known as Gradient Harmonization, including GH and GH++, to mitigate the conflict between domain alignment and classification tasks. GH operates by altering the gradient angle between different tasks from an obtuse angle to an acute angle, thus resolving the conflict and trade-offing the two tasks in a coordinated manner. Yet, this would cause both tasks to deviate from their original optimization directions. We thus further propose an improved version, GH++, which adjusts the gradient angle between tasks from an obtuse angle to a vertical angle. This not only eliminates the conflict but also minimizes deviation from the original gradient directions. Finally, for optimization convenience and efficiency, we evolve the gradient harmonization strategies into a dynamically weighted loss function using an integral operator on the harmonized gradient. Notably, GH/GH++ are orthogonal to UDA and can be seamlessly integrated into most existing UDA models. Theoretical insights and experimental analyses demonstrate that the proposed approaches not only enhance popular UDA baselines but also improve recent state-of-the-art models.

Gradient Harmonization in Unsupervised Domain Adaptation

TL;DR

Abstract

Paper Structure (26 sections, 5 theorems, 45 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 26 sections, 5 theorems, 45 equations, 14 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Unsupervised Domain Adaptation (UDA)
Multi Task Learning
Proposed Approach
Problem Definition
A General Framework of UDA
Gradient Harmonization (GH)
Essence and Insights of GH
Improved Version: GH++
Equivalent Model of UDA with GH/GH++
Experiments
Datasets
Implementation Details
Results on UDA
...and 11 more sections

Key Result

Lemma 1

Given two objective functions $\mathcal{L}_1(\Theta)$ and $\mathcal{L}_2(\Theta)$, we define $g_1$ and $g_2$ as their gradient, respectively, and $\tilde{g}_1$ is the result of harmonizing the gradient $g_1$. For minimizing the objective $\mathcal{L}_1(\Theta-\tilde{g}_1)+\mathcal{L}_2(\Theta-\tilde where $\delta(\cdot)$ represents the indicator function whose value is 0 or 1 and the mathematical

Figures (14)

Figure 1: Motivation of the proposed Gradient Harmonization. At point $\textbf{a}$, the black and red arrows point to the optimal gradient descent direction of domain alignment and classification, respectively. The obtuse angle formed by the two gradients of both tasks leads to optimization conflict and further destroy the multi-task optimality.
Figure 2: Inner product distributions (histogram) of the two baselines MCD c:2MCD and DWL c:5DWL in the training process. The horizontal axis represents the inner product of the two gradients, and the vertical axis represents frequency (i.e., number of occurrences of inner product of two gradients). Obviously, both (a) and (b) exist obtuse angles, i.e. optimization conflict, and the optimization conflicts of (a) are more serious.
Figure 3: The usage illustration of our GH module. Optimization objectives include universal domain alignment loss and classification loss. GH module is responsible for harmonization process for the gradients of the two losses. Then the coefficients $\tau_1$ and $\tau_2$ are deduced from GH with the loss gradients $g_1$ and $g_2$ to reweight the two losses. Finally, the reweighted loss functions are backpropagated to update network parameters.
Figure 4: Overall idea of de-conflict for the gradients $g_1$ and $g_2$ of two tasks. (a) displays the angle between two gradients is an acute angle. (b) displays the angle between two gradients is an obtuse angle. Following GH, (b) needs to be processed and (b1), (b2) and (b3) are harmonization process. (b1) and (b2) are the details of performing our gradient harmonization on $g_1$ and $g_2$, resp. (b3) is final harmonization results. $\tilde{g}_1$ and $\tilde{g}_2$ represent the gradients after harmonization, resp. Finally, after applying GH, the angle between $g_1$ and $g_2$ has changed from obtuse angle to acute angle.
Figure 5: Gradient aggregation. (a) Gradient aggregation when the angle of the original gradients $g_1$ and $g_2$ is an acute angle. (b) Gradient aggregation when the angle of the original gradients $g_1$ and $g_2$ is an obtuse angle. (c) Gradient aggregation after GH for case (b). Comparing (b) and (c), GH changes the magnitude and the direction of the aggregated/combined gradient $g$ and gradient harmonization is realized.
...and 9 more figures

Theorems & Definitions (8)

Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Theorem 1
Theorem 2

Gradient Harmonization in Unsupervised Domain Adaptation

TL;DR

Abstract

Gradient Harmonization in Unsupervised Domain Adaptation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (8)