Nearest Neighbor Sampling for Covariate Shift Adaptation

François Portier; Lionel Truquet; Ikko Yamane

Nearest Neighbor Sampling for Covariate Shift Adaptation

François Portier, Lionel Truquet, Ikko Yamane

TL;DR

This work tackles covariate shift by eschewing weight estimation in favor of a nonparametric, hyperparameter-free approach that labels unlabeled target data using a $1$-nearest-neighbor sampler built from the source, enabling direct ERM on the augmented target. The authors establish a sharp error decomposition into marginal and conditional components, showing that the conditional $k$-NN error trades off bias and variance in a way that favors $k=1$, yielding near-parametric rates in many regimes and a quasi-linear runtime when accelerated with kd-trees. They connect the method to empirical risk minimization, proving consistency and rates for general and linear models under covariate shift, and demonstrate strong empirical performance and substantial speedups over kernel and weight-based baselines on synthetic and real data. Overall, the paper provides a scalable, theory-backed framework for covariate shift adaptation that leverages unlabeled target data through simple nearest-neighbor labeling, with practical impact for large-scale learning under distributional shift.

Abstract

Many existing covariate shift adaptation methods estimate sample weights given to loss values to mitigate the gap between the source and the target distribution. However, estimating the optimal weights typically involves computationally expensive matrix inversion and hyper-parameter tuning. In this paper, we propose a new covariate shift adaptation method which avoids estimating the weights. The basic idea is to directly work on unlabeled target data, labeled according to the $k$-nearest neighbors in the source dataset. Our analysis reveals that setting $k = 1$ is an optimal choice. This property removes the necessity of tuning the only hyper-parameter $k$ and leads to a running time quasi-linear in the sample size. Our results include sharp rates of convergence for our estimator, with a tight control of the mean square error and explicit constants. In particular, the variance of our estimators has the same rate of convergence as for standard parametric estimation despite their non-parametric nature. The proposed estimator shares similarities with some matching-based treatment effect estimators used, e.g., in biostatistics, econometrics, and epidemiology. Our experiments show that it achieves drastic reduction in the running time with remarkable accuracy.

Nearest Neighbor Sampling for Covariate Shift Adaptation

TL;DR

This work tackles covariate shift by eschewing weight estimation in favor of a nonparametric, hyperparameter-free approach that labels unlabeled target data using a

-nearest-neighbor sampler built from the source, enabling direct ERM on the augmented target. The authors establish a sharp error decomposition into marginal and conditional components, showing that the conditional

-NN error trades off bias and variance in a way that favors

, yielding near-parametric rates in many regimes and a quasi-linear runtime when accelerated with kd-trees. They connect the method to empirical risk minimization, proving consistency and rates for general and linear models under covariate shift, and demonstrate strong empirical performance and substantial speedups over kernel and weight-based baselines on synthetic and real data. Overall, the paper provides a scalable, theory-backed framework for covariate shift adaptation that leverages unlabeled target data through simple nearest-neighbor labeling, with practical impact for large-scale learning under distributional shift.

Abstract

-nearest neighbors in the source dataset. Our analysis reveals that setting

is an optimal choice. This property removes the necessity of tuning the only hyper-parameter

and leads to a running time quasi-linear in the sample size. Our results include sharp rates of convergence for our estimator, with a tight control of the mean square error and explicit constants. In particular, the variance of our estimators has the same rate of convergence as for standard parametric estimation despite their non-parametric nature. The proposed estimator shares similarities with some matching-based treatment effect estimators used, e.g., in biostatistics, econometrics, and epidemiology. Our experiments show that it achieves drastic reduction in the running time with remarkable accuracy.

Paper Structure (58 sections, 12 theorems, 109 equations, 7 figures, 3 tables, 2 algorithms)

This paper contains 58 sections, 12 theorems, 109 equations, 7 figures, 3 tables, 2 algorithms.

Introduction
Problem setup
Proposed method
Computing time
Theoretical analysis
The key decomposition
Marginal sampling error
Notes.
Conditional sampling error of the nearest neighbor estimate
Notes.
Notes.
Applications to empirical risk minimization
Mathematical background
Consistency of general empirical risk minimizers
Convergence rate for linear least-squares estimators
...and 43 more sections

Key Result

Proposition 1

Suppose that $\hat{Q}$ satisfies the following strong law of large number: for each $h$ such that $Q(h) <\infty$, we have $\lim_{n \to \infty} \hat{Q}(h) = Q(h)$ almost surely. Then, if $m:= m_n \to \infty$ as $n\to \infty$, we have the following central limit theorem: for each function such that $Q where $V = \lim_{n \to \infty} \{ \hat{Q} (h^2) - \hat{Q} (h)^2 \}$.

Figures (7)

Figure 1: Mean Squared Errors (MSE) for Experiment E1 (estimation of $\int y\, Q(dy)$). The horizontal axis is for the sample sizes $n$ ($= m$), and the vertical axis is for the mean absolute error of each estimate. The four figures are for different data dimensionalities.
Figure 2: Running times for Experiment E1. The horizontal axis is for the sample sizes $n$ ($= m$), and the vertical axis is for the mean running time of each method. The four figures are for different data dimensionalities.
Figure 3: Estimation errors for Experiment E2 (estimation of $\int (y - f_0(x) )^2\, Q(dx, dy)$)
Figure 4: Running times in Experiment E2
Figure 5: Mean Squared Errors (MSE) (subtracted by $0.0095$) for Experiment E3 (linear regression)
...and 2 more figures

Theorems & Definitions (21)

Definition 1: Source sample, source distribution
Definition 2: Target sample, target distribution
Definition 3: Covariate shift
Definition 4: Mean estimation under covariate shift
Definition 5: Bootstrap sample
Proposition 1
Proposition 2
Proposition 3
Theorem 1
Proposition 4
...and 11 more

Nearest Neighbor Sampling for Covariate Shift Adaptation

TL;DR

Abstract

Nearest Neighbor Sampling for Covariate Shift Adaptation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (21)