On $f$-Divergence Principled Domain Adaptation: An Improved Framework

Ziqiao Wang; Yongyi Mao

On $f$-Divergence Principled Domain Adaptation: An Improved Framework

Ziqiao Wang, Yongyi Mao

TL;DR

This work improves the theoretical foundations of UDA proposed in Acuna et al. (2021) by refining their $f-divergence-based discrepancy and additionally introducing a new measure, $f$-domain discrepancy ($f-DD), which obtains novel target error and sample complexity bounds.

Abstract

Unsupervised domain adaptation (UDA) plays a crucial role in addressing distribution shifts in machine learning. In this work, we improve the theoretical foundations of UDA proposed in Acuna et al. (2021) by refining their $f$-divergence-based discrepancy and additionally introducing a new measure, $f$-domain discrepancy ($f$-DD). By removing the absolute value function and incorporating a scaling parameter, $f$-DD obtains novel target error and sample complexity bounds, allowing us to recover previous KL-based results and bridging the gap between algorithms and theory presented in Acuna et al. (2021). Using a localization technique, we also develop a fast-rate generalization bound. Empirical results demonstrate the superior performance of $f$-DD-based learning algorithms over previous works in popular UDA benchmarks.

On $f$-Divergence Principled Domain Adaptation: An Improved Framework

TL;DR

This work improves the theoretical foundations of UDA proposed in Acuna et al. (2021) by refining their

f-DD), which obtains novel target error and sample complexity bounds.

Abstract

-divergence-based discrepancy and additionally introducing a new measure,

-domain discrepancy (

-DD). By removing the absolute value function and incorporating a scaling parameter,

-DD obtains novel target error and sample complexity bounds, allowing us to recover previous KL-based results and bridging the gap between algorithms and theory presented in Acuna et al. (2021). Using a localization technique, we also develop a fast-rate generalization bound. Empirical results demonstrate the superior performance of

-DD-based learning algorithms over previous works in popular UDA benchmarks.

Paper Structure (48 sections, 31 theorems, 87 equations, 4 figures, 10 tables)

This paper contains 48 sections, 31 theorems, 87 equations, 4 figures, 10 tables.

Introduction
Preliminaries
Notations and UDA Setup
Background on $f$-divergence
Warm-Up: Refined Absolute --Divergence Domain Discrepancy
New --Divergence-Based DA Theory
Sharper Bounds via Localization
Algorithms and Experimental Results
Domain Adversarial Learning Algorithm
Experiments
Dataset
Discrepancy Measures
Baselines and Implementation Details
Boosted Benchmark Performance by $f$-DD
Failure of Absolute Discrepancy
...and 33 more sections

Key Result

Lemma 2.1

Let $\phi^*$ be the convex conjugateFor a function $f:\mathcal{X}\to\mathbb{R}\cup\{-\infty, +\infty\}$, its convex conjugate is $f^*(y)\triangleq \sup_{x\in{\rm dom}(f)}\langle x, y\rangle - f(x)$. of $\phi$, and $\mathcal{G}=\{g:\Theta\to{\rm dom}(\phi^*)\}$. Then

Figures (4)

Figure 1: Failure of absolute discrepancy. The $y$-axis is the estimated $f$-divergence.
Figure 2: Illustration of the adversarial training framework for $f$-DD-based UDA. The framework includes the representation network ($h_{\rm rep}$), the main classifier ($h_{\rm cls}$), and the auxiliary classification network ($h'_{\rm cls}$). It jointly minimizes the empirical risk on the source domain and the approximated $f$-DD between the source and target domains.
Figure 3: Comparison between $\mathrm{D}_{\phi}^{h,\mathcal{H}}$ and $\widetilde{\mathrm{D}}_{\phi}^{h,\mathcal{H}}$. The $y$-axis is the estimated corresponding $f$-divergence and the $x$-axis is the number of iterations.
Figure 4: Visualization results of representations obtained by using t-SNE. The source domain (blue points) is U and the target domain (orange points) is M.

Theorems & Definitions (61)

Definition 2.1: $f$-divergence yury2022information
Lemma 2.1: nguyen2010estimating
Lemma 2.2: agrawal2020optimal
Definition 3.1
Remark 3.1
Lemma 3.1
Remark 3.2: Regarding $\lambda^*$
Theorem 3.1
Definition 4.1: $f$-DD
Remark 4.1
...and 51 more

On $f$-Divergence Principled Domain Adaptation: An Improved Framework

TL;DR

Abstract

On $f$-Divergence Principled Domain Adaptation: An Improved Framework

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (61)