Table of Contents
Fetching ...

Learning bounds for doubly-robust covariate shift adaptation

Jeonghwan Lee, Cong Ma

TL;DR

This work addresses the challenge of learning under covariate shift by analyzing a doubly-robust (DR) covariate-shift estimator that combines density-ratio weighting with a pilot regression under a structure-agnostic framework. It delivers the first non-asymptotic, high-probability learning bounds for the DR estimator that depend only on the $L^2$-errors of nuisance pilots and the Rademacher complexity of the model class, and it shows fast $1/n$-type rates in well-specified parametric settings governed by Fisher information mismatch between the source and target. The results connect finite-sample out-of-distribution generalization bounds with asymptotic efficiency properties, offering practical guidance on data collection and nuisance estimation under covariate shift. Overall, the paper unifies structure-agnostic guarantees and parametric analysis to provide a comprehensive theoretical foundation for DR covariate shift adaptation.

Abstract

Distribution shift between the training domain and the test domain poses a key challenge for modern machine learning. An extensively studied instance is the \emph{covariate shift}, where the marginal distribution of covariates differs across domains, while the conditional distribution of outcome remains the same. The doubly-robust (DR) estimator, recently introduced by \cite{kato2023double}, combines the density ratio estimation with a pilot regression model and demonstrates asymptotic normality and $\sqrt{n}$-consistency, even when the pilot estimates converge slowly. However, the prior arts has focused exclusively on deriving asymptotic results and has left open the question of non-asymptotic guarantees for the DR estimator. This paper establishes the first non-asymptotic learning bounds for the DR covariate shift adaptation. Our main contributions are two-fold: (\romannumeral 1) We establish \emph{structure-agnostic} high-probability upper bounds on the excess target risk of the DR estimator that depend only on the $L^2$-errors of the pilot estimates and the Rademacher complexity of the model class, without assuming specific procedures to obtain the pilot estimate, and (\romannumeral 2) under \emph{well-specified parameterized models}, we analyze the DR covariate shift adaptation based on modern techniques for non-asymptotic analysis of MLE, whose key terms governed by the Fisher information mismatch term between the source and target distributions. Together, these findings bridge asymptotic efficiency properties and a finite-sample out-of-distribution generalization bounds, providing a comprehensive theoretical underpinnings for the DR covariate shift adaptation.

Learning bounds for doubly-robust covariate shift adaptation

TL;DR

This work addresses the challenge of learning under covariate shift by analyzing a doubly-robust (DR) covariate-shift estimator that combines density-ratio weighting with a pilot regression under a structure-agnostic framework. It delivers the first non-asymptotic, high-probability learning bounds for the DR estimator that depend only on the -errors of nuisance pilots and the Rademacher complexity of the model class, and it shows fast -type rates in well-specified parametric settings governed by Fisher information mismatch between the source and target. The results connect finite-sample out-of-distribution generalization bounds with asymptotic efficiency properties, offering practical guidance on data collection and nuisance estimation under covariate shift. Overall, the paper unifies structure-agnostic guarantees and parametric analysis to provide a comprehensive theoretical foundation for DR covariate shift adaptation.

Abstract

Distribution shift between the training domain and the test domain poses a key challenge for modern machine learning. An extensively studied instance is the \emph{covariate shift}, where the marginal distribution of covariates differs across domains, while the conditional distribution of outcome remains the same. The doubly-robust (DR) estimator, recently introduced by \cite{kato2023double}, combines the density ratio estimation with a pilot regression model and demonstrates asymptotic normality and -consistency, even when the pilot estimates converge slowly. However, the prior arts has focused exclusively on deriving asymptotic results and has left open the question of non-asymptotic guarantees for the DR estimator. This paper establishes the first non-asymptotic learning bounds for the DR covariate shift adaptation. Our main contributions are two-fold: (\romannumeral 1) We establish \emph{structure-agnostic} high-probability upper bounds on the excess target risk of the DR estimator that depend only on the -errors of the pilot estimates and the Rademacher complexity of the model class, without assuming specific procedures to obtain the pilot estimate, and (\romannumeral 2) under \emph{well-specified parameterized models}, we analyze the DR covariate shift adaptation based on modern techniques for non-asymptotic analysis of MLE, whose key terms governed by the Fisher information mismatch term between the source and target distributions. Together, these findings bridge asymptotic efficiency properties and a finite-sample out-of-distribution generalization bounds, providing a comprehensive theoretical underpinnings for the DR covariate shift adaptation.

Paper Structure

This paper contains 42 sections, 17 theorems, 167 equations.

Key Result

Theorem 4.1

With Assumptions assumption:covariate_shift--assumption:black_box_estimates, the doubly-robust (DR) estimator eqn:dr_estimator_v1 achieves the $\mathbb{Q}$-estimation error with probability at least $1 - \delta$ under the probability measure $\mathbb{P}^{\otimes n_{\mathbb{P}}} \otimes \mathbb{Q}_{X}^{\otimes n_{\mathbb{Q}}}$.

Theorems & Definitions (20)

  • Remark 4.1
  • Definition 4.1: Rademacher complexity
  • Theorem 4.1: Structure-agnostic upper bound I of the DR estimator
  • Proposition 4.1
  • Remark 4.2
  • Theorem 5.1: Informal, see Theorem \ref{['thm:detailed_upper_bound_alg:dr_covariate_shift_adaptation_v2']}
  • Lemma A.1: The Ledoux-Talagrand contraction principle
  • Lemma A.2: Classical Talagrand's concentration inequality
  • Lemma A.3
  • Lemma A.4
  • ...and 10 more