A survey on domain adaptation theory: learning bounds and theoretical guarantees
Ievgen Redko, Emilie Morvant, Amaury Habrard, Marc Sebban, Younès Bennani
TL;DR
The paper surveys theoretical guarantees for domain adaptation within transfer learning, focusing on how target performance can be bounded by source performance under distribution shifts. It inventories a spectrum of bound paradigms, from divergences such as L1, ${\mathcal{H}\Delta\mathcal{H}}$-divergence, and discrepancy to IPMs including Wasserstein and MMD, along with PAC-Bayesian and algorithmic-robustness perspectives. A central theme is the trade-off among source risk, domain divergence, and adaptation capacity (the nonestimable lambda term), plus hardness results highlighting when adaptation may be provably impossible or data-hungry. The survey also covers extensions to regression, semi-supervised settings, multi-source scenarios, and hypothesis-transfer learning, providing a comprehensive map of when and how domain adaptation can succeed and what remains open.
Abstract
All famous machine learning algorithms that comprise both supervised and semi-supervised learning work well only under a common assumption: the training and test data follow the same distribution. When the distribution changes, most statistical models must be reconstructed from newly collected data, which for some applications can be costly or impossible to obtain. Therefore, it has become necessary to develop approaches that reduce the need and the effort to obtain new labeled samples by exploiting data that are available in related areas, and using these further across similar fields. This has given rise to a new machine learning framework known as transfer learning: a learning setting inspired by the capability of a human being to extrapolate knowledge across tasks to learn more efficiently. Despite a large amount of different transfer learning scenarios, the main objective of this survey is to provide an overview of the state-of-the-art theoretical results in a specific, and arguably the most popular, sub-field of transfer learning, called domain adaptation. In this sub-field, the data distribution is assumed to change across the training and the test data, while the learning task remains the same. We provide a first up-to-date description of existing results related to domain adaptation problem that cover learning bounds based on different statistical learning frameworks.
