Table of Contents
Fetching ...

A survey on domain adaptation theory: learning bounds and theoretical guarantees

Ievgen Redko, Emilie Morvant, Amaury Habrard, Marc Sebban, Younès Bennani

TL;DR

The paper surveys theoretical guarantees for domain adaptation within transfer learning, focusing on how target performance can be bounded by source performance under distribution shifts. It inventories a spectrum of bound paradigms, from divergences such as L1, ${\mathcal{H}\Delta\mathcal{H}}$-divergence, and discrepancy to IPMs including Wasserstein and MMD, along with PAC-Bayesian and algorithmic-robustness perspectives. A central theme is the trade-off among source risk, domain divergence, and adaptation capacity (the nonestimable lambda term), plus hardness results highlighting when adaptation may be provably impossible or data-hungry. The survey also covers extensions to regression, semi-supervised settings, multi-source scenarios, and hypothesis-transfer learning, providing a comprehensive map of when and how domain adaptation can succeed and what remains open.

Abstract

All famous machine learning algorithms that comprise both supervised and semi-supervised learning work well only under a common assumption: the training and test data follow the same distribution. When the distribution changes, most statistical models must be reconstructed from newly collected data, which for some applications can be costly or impossible to obtain. Therefore, it has become necessary to develop approaches that reduce the need and the effort to obtain new labeled samples by exploiting data that are available in related areas, and using these further across similar fields. This has given rise to a new machine learning framework known as transfer learning: a learning setting inspired by the capability of a human being to extrapolate knowledge across tasks to learn more efficiently. Despite a large amount of different transfer learning scenarios, the main objective of this survey is to provide an overview of the state-of-the-art theoretical results in a specific, and arguably the most popular, sub-field of transfer learning, called domain adaptation. In this sub-field, the data distribution is assumed to change across the training and the test data, while the learning task remains the same. We provide a first up-to-date description of existing results related to domain adaptation problem that cover learning bounds based on different statistical learning frameworks.

A survey on domain adaptation theory: learning bounds and theoretical guarantees

TL;DR

The paper surveys theoretical guarantees for domain adaptation within transfer learning, focusing on how target performance can be bounded by source performance under distribution shifts. It inventories a spectrum of bound paradigms, from divergences such as L1, -divergence, and discrepancy to IPMs including Wasserstein and MMD, along with PAC-Bayesian and algorithmic-robustness perspectives. A central theme is the trade-off among source risk, domain divergence, and adaptation capacity (the nonestimable lambda term), plus hardness results highlighting when adaptation may be provably impossible or data-hungry. The survey also covers extensions to regression, semi-supervised settings, multi-source scenarios, and hypothesis-transfer learning, providing a comprehensive map of when and how domain adaptation can succeed and what remains open.

Abstract

All famous machine learning algorithms that comprise both supervised and semi-supervised learning work well only under a common assumption: the training and test data follow the same distribution. When the distribution changes, most statistical models must be reconstructed from newly collected data, which for some applications can be costly or impossible to obtain. Therefore, it has become necessary to develop approaches that reduce the need and the effort to obtain new labeled samples by exploiting data that are available in related areas, and using these further across similar fields. This has given rise to a new machine learning framework known as transfer learning: a learning setting inspired by the capability of a human being to extrapolate knowledge across tasks to learn more efficiently. Despite a large amount of different transfer learning scenarios, the main objective of this survey is to provide an overview of the state-of-the-art theoretical results in a specific, and arguably the most popular, sub-field of transfer learning, called domain adaptation. In this sub-field, the data distribution is assumed to change across the training and the test data, while the learning task remains the same. We provide a first up-to-date description of existing results related to domain adaptation problem that cover learning bounds based on different statistical learning frameworks.

Paper Structure

This paper contains 85 sections, 44 theorems, 154 equations, 9 figures, 1 algorithm.

Key Result

Theorem 1

Let ${\bf X}$ be an input space, $Y = \lbrace -1, +1\rbrace$ the output space, and ${\cal D}$ their joint distribution. Let $S$ be a finite sample of size $m$ drawn i.i.d. from ${\cal D}$, and $\mathcal{H} = \lbrace h:X \rightarrow Y \rbrace$ be a hypothesis class of VC dimension $\text{VC}(\mathc

Figures (9)

  • Figure 1: Comparison of standard supervised learning, transfer learning, and positioning of the domain adaptation.
  • Figure 2: Illustration of different loss functions.
  • Figure 3: Illustration of the Vapnik-Charvonenkis (VC) dimension. Here, half-planes in $\mathbb{R}^d$ with $d=2$ can correctly classify at most three points for all possible $2^3$ labelings. The VC dimension here is $2+1$.
  • Figure 4: Illustration of the ${\mathcal{H}\!\Delta\!\mathcal{H}}$-divergence when the hypothesis class consists of linear (top row) and nonlinear (bottom row) classifiers. Note that the indicated value of ${\mathcal{H}\!\Delta\!\mathcal{H}}$ is the error of the obtained classifier without subtracting 1 and multiplying the result by two, as in \ref{['trm:hdiv_emp']}.
  • Figure 5: Illustration of the optimal value for $\alpha$ as a function of the number of source and target labeled instances.
  • ...and 4 more figures

Theorems & Definitions (68)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 1
  • Definition 5
  • Definition 6
  • Theorem 2
  • Theorem 3: GermainLLM09GermainLLMR15
  • Definition 7
  • ...and 58 more