Table of Contents
Fetching ...

On $f$-Divergence Principled Domain Adaptation: An Improved Framework

Ziqiao Wang, Yongyi Mao

TL;DR

This work improves the theoretical foundations of UDA proposed in Acuna et al. (2021) by refining their $f-divergence-based discrepancy and additionally introducing a new measure, $f$-domain discrepancy ($f-DD), which obtains novel target error and sample complexity bounds.

Abstract

Unsupervised domain adaptation (UDA) plays a crucial role in addressing distribution shifts in machine learning. In this work, we improve the theoretical foundations of UDA proposed in Acuna et al. (2021) by refining their $f$-divergence-based discrepancy and additionally introducing a new measure, $f$-domain discrepancy ($f$-DD). By removing the absolute value function and incorporating a scaling parameter, $f$-DD obtains novel target error and sample complexity bounds, allowing us to recover previous KL-based results and bridging the gap between algorithms and theory presented in Acuna et al. (2021). Using a localization technique, we also develop a fast-rate generalization bound. Empirical results demonstrate the superior performance of $f$-DD-based learning algorithms over previous works in popular UDA benchmarks.

On $f$-Divergence Principled Domain Adaptation: An Improved Framework

TL;DR

This work improves the theoretical foundations of UDA proposed in Acuna et al. (2021) by refining their ff-DD), which obtains novel target error and sample complexity bounds.

Abstract

Unsupervised domain adaptation (UDA) plays a crucial role in addressing distribution shifts in machine learning. In this work, we improve the theoretical foundations of UDA proposed in Acuna et al. (2021) by refining their -divergence-based discrepancy and additionally introducing a new measure, -domain discrepancy (-DD). By removing the absolute value function and incorporating a scaling parameter, -DD obtains novel target error and sample complexity bounds, allowing us to recover previous KL-based results and bridging the gap between algorithms and theory presented in Acuna et al. (2021). Using a localization technique, we also develop a fast-rate generalization bound. Empirical results demonstrate the superior performance of -DD-based learning algorithms over previous works in popular UDA benchmarks.
Paper Structure (48 sections, 31 theorems, 87 equations, 4 figures, 10 tables)

This paper contains 48 sections, 31 theorems, 87 equations, 4 figures, 10 tables.

Key Result

Lemma 2.1

Let $\phi^*$ be the convex conjugateFor a function $f:\mathcal{X}\to\mathbb{R}\cup\{-\infty, +\infty\}$, its convex conjugate is $f^*(y)\triangleq \sup_{x\in{\rm dom}(f)}\langle x, y\rangle - f(x)$. of $\phi$, and $\mathcal{G}=\{g:\Theta\to{\rm dom}(\phi^*)\}$. Then

Figures (4)

  • Figure 1: Failure of absolute discrepancy. The $y$-axis is the estimated $f$-divergence.
  • Figure 2: Illustration of the adversarial training framework for $f$-DD-based UDA. The framework includes the representation network ($h_{\rm rep}$), the main classifier ($h_{\rm cls}$), and the auxiliary classification network ($h'_{\rm cls}$). It jointly minimizes the empirical risk on the source domain and the approximated $f$-DD between the source and target domains.
  • Figure 3: Comparison between $\mathrm{D}_{\phi}^{h,\mathcal{H}}$ and $\widetilde{\mathrm{D}}_{\phi}^{h,\mathcal{H}}$. The $y$-axis is the estimated corresponding $f$-divergence and the $x$-axis is the number of iterations.
  • Figure 4: Visualization results of representations obtained by using t-SNE. The source domain (blue points) is U and the target domain (orange points) is M.

Theorems & Definitions (61)

  • Definition 2.1: $f$-divergence yury2022information
  • Lemma 2.1: nguyen2010estimating
  • Lemma 2.2: agrawal2020optimal
  • Definition 3.1
  • Remark 3.1
  • Lemma 3.1
  • Remark 3.2: Regarding $\lambda^*$
  • Theorem 3.1
  • Definition 4.1: $f$-DD
  • Remark 4.1
  • ...and 51 more