Efficient and Provable Algorithms for Covariate Shift
Deeksha Adil, Jarosław Błasiok
TL;DR
This work tackles the covariate shift problem by focusing on covariate-shifted mean estimation, i.e., estimating E_{x~p_te}[f(x)] given labeled samples from p_tr and unlabeled samples from p_te with f bounded in [-1,1]. It develops provably efficient algorithms that estimate the density ratio p_te/p_tr either directly via classification (including logistic and kernel logistic regression) or by learning the densities themselves under regularity conditions, and then uses these estimates to form a biased but controllable estimator for the target expectation. The paper provides rigorous guarantees in several regimes: (i) Gaussian and isotropic Gaussian distributions with concrete sample complexities, (ii) TV-learnability conditions implying covariate shift with polynomial sample complexity, (iii) exponential-family and RKHS settings enabling kernel logistic regression to recover density ratios with quantifiable KL and TV guarantees, and (iv) general RKHS-based kernels enabling density-ratio estimation in a broad function class. The results collectively establish a solid theoretical foundation for covariate-shift algorithms, linking density estimation, classification-based density ratio estimation, and kernel methods to concrete sample-accuracy guarantees with practical implications for transfer learning and distributional shift scenarios.
Abstract
Covariate shift, a widely used assumption in tackling {\it distributional shift} (when training and test distributions differ), focuses on scenarios where the distribution of the labels conditioned on the feature vector is the same, but the distribution of features in the training and test data are different. Despite the significance and extensive work on covariate shift, theoretical guarantees for algorithms in this domain remain sparse. In this paper, we distill the essence of the covariate shift problem and focus on estimating the average $\mathbb{E}_{\tilde{\mathbf{x}}\sim p_{\mathrm{test}}}\mathbf{f}(\tilde{\mathbf{x}})$, of any unknown and bounded function $\mathbf{f}$, given labeled training samples $(\mathbf{x}_i, \mathbf{f}(\mathbf{x}_i))$, and unlabeled test samples $\tilde{\mathbf{x}}_i$; this is a core subroutine for several widely studied learning problems. We give several efficient algorithms, with provable sample complexity and computational guarantees. Moreover, we provide the first rigorous analysis of algorithms in this space when $\mathbf{f}$ is unrestricted, laying the groundwork for developing a solid theoretical foundation for covariate shift problems.
