Gradual Domain Adaptation: Theory and Algorithms

Yifei He; Haoxiang Wang; Bo Li; Han Zhao

Gradual Domain Adaptation: Theory and Algorithms

Yifei He, Haoxiang Wang, Bo Li, Han Zhao

TL;DR

This work addresses large distribution shifts in unsupervised domain adaptation by introducing gradual domain adaptation (GDA) with intermediate domains. It delivers a significantly tighter generalization bound for gradual self-training than prior theory, showing that placing intermediate domains along the Wasserstein geodesic minimizes path length and target error; it also characterizes an optimal number of steps and their spacing. Building on this theory, the authors propose GOAT, a Generative Gradual Domain Adaptation framework that creates intermediate domains via Optimal Transport in a learned feature space and applies gradual self-training along the resulting sequence. Empirically, GOAT outperforms standard GST and various UDA baselines on Rotated MNIST, Color-Shift MNIST, Portraits, and Cover Type, particularly when intermediate-domain data are scarce. The approach broadens the practical applicability of GDA, providing a principled route to improve robustness to distribution shifts in real-world settings, and code is made publicly available.

Abstract

Unsupervised domain adaptation (UDA) adapts a model from a labeled source domain to an unlabeled target domain in a one-off way. Though widely applied, UDA faces a great challenge whenever the distribution shift between the source and the target is large. Gradual domain adaptation (GDA) mitigates this limitation by using intermediate domains to gradually adapt from the source to the target domain. In this work, we first theoretically analyze gradual self-training, a popular GDA algorithm, and provide a significantly improved generalization bound compared with Kumar et al. (2020). Our theoretical analysis leads to an interesting insight: to minimize the generalization error on the target domain, the sequence of intermediate domains should be placed uniformly along the Wasserstein geodesic between the source and target domains. The insight is particularly useful under the situation where intermediate domains are missing or scarce, which is often the case in real-world applications. Based on the insight, we propose $\textbf{G}$enerative Gradual D$\textbf{O}$main $\textbf{A}$daptation with Optimal $\textbf{T}$ransport (GOAT), an algorithmic framework that can generate intermediate domains in a data-dependent way. More concretely, we first generate intermediate domains along the Wasserstein geodesic between two given consecutive domains in a feature space, then apply gradual self-training to adapt the source-trained classifier to the target along the sequence of intermediate domains. Empirically, we demonstrate that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly broadening the real-world application scenarios of GDA. Our code is available at https://github.com/uiuctml/GOAT.

Gradual Domain Adaptation: Theory and Algorithms

TL;DR

Abstract

enerative Gradual D

main

daptation with Optimal

ransport (GOAT), an algorithmic framework that can generate intermediate domains in a data-dependent way. More concretely, we first generate intermediate domains along the Wasserstein geodesic between two given consecutive domains in a feature space, then apply gradual self-training to adapt the source-trained classifier to the target along the sequence of intermediate domains. Empirically, we demonstrate that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly broadening the real-world application scenarios of GDA. Our code is available at https://github.com/uiuctml/GOAT.

Paper Structure (68 sections, 12 theorems, 58 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 68 sections, 12 theorems, 58 equations, 7 figures, 7 tables, 1 algorithm.

Introduction
Preliminaries
Notation
Problem Setup
Binary Classification
Gradually shifting distributions
Classifier and Loss
Unsupervised Domain Adaptation (UDA)
Gradual Domain Adaptation (GDA)
Remarks on Wasserstein Metrics
Gradual Self-Training
Theoretical Analyses
Error Difference over Distribution Shift
Comparison with kumar2020understanding
Remarks on Generality
...and 53 more sections

Key Result

Lemma 1

Consider two arbitrary measures $\mu, \nu$ over $\mathcal{X} \times \mathcal{Y}$. Then, for arbitrary classifier $h$ and loss function $\ell$ satisfying Assumption assum:Lipschitz-model, assum:Lipschitz-loss, the population loss of $h$ on $\mu$ and $\nu$ satisfies where $W_p$ is the Wasserstein-$p$ distance metric and $p\geq 1$.

Figures (7)

Figure 1: A schematic diagram comparing Unsupervised Domain Adaptation (UDA) vs. Gradual Domain Adaptation (GDA), using the example of Rotated MNIST. In GDA, given labeled data from a source domain, models are adapted to the target domain, with the help of unlabeled data from intermediate domains gradually shifting from the source to target.
Figure 2: An illustration of the divide-and-conquer strategy to address large data distribution shift (best viewed in color). The distribution shift between the source and target is divided into $T-1$ smaller pieces with (given or generated) unlabeled intermediate data. The model $h_t$ is gradually adapted in each step to reach the final solution.
Figure 3: An illustration of the optimal path in gradual domain adaptation, with a detailed explanation in Sec. \ref{['sec:optimal-path']}. The orange path is the geodesic connecting the source domain and target domain.
Figure 4: Samples from generated intermediate domains.
Figure 5: Illustration of the intermediate domain generation in GOAT. (a) without any given intermediate domain, (b) with one given intermediate domain.
...and 2 more figures

Theorems & Definitions (21)

Definition 1: $p$-Wasserstein Distance
Definition 2: Distribution Shifts
Lemma 1: Error Difference over Shifted Domains
Proposition 1: The stability of the ST algorithm
Definition 3: Complete Binary Trees
Definition 4: Sequential Rademacher Complexity
Example 1: Linear Models
Example 2: Neural Networks
Definition 5: Discrepancy Measure
Lemma 2: Discrepancy Bound
...and 11 more

Gradual Domain Adaptation: Theory and Algorithms

TL;DR

Abstract

Gradual Domain Adaptation: Theory and Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (21)