Learning bounds for doubly-robust covariate shift adaptation
Jeonghwan Lee, Cong Ma
TL;DR
This work addresses the challenge of learning under covariate shift by analyzing a doubly-robust (DR) covariate-shift estimator that combines density-ratio weighting with a pilot regression under a structure-agnostic framework. It delivers the first non-asymptotic, high-probability learning bounds for the DR estimator that depend only on the $L^2$-errors of nuisance pilots and the Rademacher complexity of the model class, and it shows fast $1/n$-type rates in well-specified parametric settings governed by Fisher information mismatch between the source and target. The results connect finite-sample out-of-distribution generalization bounds with asymptotic efficiency properties, offering practical guidance on data collection and nuisance estimation under covariate shift. Overall, the paper unifies structure-agnostic guarantees and parametric analysis to provide a comprehensive theoretical foundation for DR covariate shift adaptation.
Abstract
Distribution shift between the training domain and the test domain poses a key challenge for modern machine learning. An extensively studied instance is the \emph{covariate shift}, where the marginal distribution of covariates differs across domains, while the conditional distribution of outcome remains the same. The doubly-robust (DR) estimator, recently introduced by \cite{kato2023double}, combines the density ratio estimation with a pilot regression model and demonstrates asymptotic normality and $\sqrt{n}$-consistency, even when the pilot estimates converge slowly. However, the prior arts has focused exclusively on deriving asymptotic results and has left open the question of non-asymptotic guarantees for the DR estimator. This paper establishes the first non-asymptotic learning bounds for the DR covariate shift adaptation. Our main contributions are two-fold: (\romannumeral 1) We establish \emph{structure-agnostic} high-probability upper bounds on the excess target risk of the DR estimator that depend only on the $L^2$-errors of the pilot estimates and the Rademacher complexity of the model class, without assuming specific procedures to obtain the pilot estimate, and (\romannumeral 2) under \emph{well-specified parameterized models}, we analyze the DR covariate shift adaptation based on modern techniques for non-asymptotic analysis of MLE, whose key terms governed by the Fisher information mismatch term between the source and target distributions. Together, these findings bridge asymptotic efficiency properties and a finite-sample out-of-distribution generalization bounds, providing a comprehensive theoretical underpinnings for the DR covariate shift adaptation.
