High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Yihang Chen; Fanghui Liu; Taiji Suzuki; Volkan Cevher

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Yihang Chen, Fanghui Liu, Taiji Suzuki, Volkan Cevher

TL;DR

This paper investigates high-dimensional kernel ridge regression under covariate shift and analyzes the role of importance weighting as a data-dependent implicit regularization. By deriving an asymptotic expansion of kernels and a bias-variance decomposition, it shows that weightings can reduce variance through spectral-view regularization while bias depends on the chosen regularization scale. The results separate intrinsic covariate-shift bias from re-weighting bias and establish that well-chosen regularization can drive the re-weighting bias to zero, with variance controlled by the spectral decay of the data-dependent kernel. Collectively, these findings offer theoretical guidance for deploying importance weighting in nonparametric, high-capacity settings where covariate shift occurs.

Abstract

This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales. In our analysis, the bias and variance can be characterized by the spectral decay of a data-dependent regularized kernel: the original kernel matrix associated with an additional re-weighting matrix, and thus the re-weighting strategy can be regarded as a data-dependent regularization for better understanding. Besides, our analysis provides asymptotic expansion of kernel functions/vectors under covariate shift, which has its own interest.

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

TL;DR

Abstract

Paper Structure (39 sections, 12 theorems, 108 equations, 1 figure, 2 tables)

This paper contains 39 sections, 12 theorems, 108 equations, 1 figure, 2 tables.

Introduction
Contributions
Related works
High-dimensional kernel regression
Covariate shift
Random matrix theory
Notations
Problem Settings
RKHS and kernels
Interpolation and regression
Interpolation
Assumptions
Basic assumptions on kernel, data distribution
Assumptions on model
Summary of notations
...and 24 more sections

Key Result

Lemma 4.1

We consider the excess risk $\|\overline{f}_{\lambda,\bm{Z}}-{f}_\rho\|_q$ conditioned on $\bm{X}$ for our re-weighting estimator eq:iw_emp_risk_Z, admitting the following bias-variance decomposition:

Figures (1)

Figure 1: We plot the empirical excess error, variance, bias and the scaled theoretical upper bound scaled V and scaled B under different decays with $\lambda\propto n^{-1/2}$.

Theorems & Definitions (21)

Definition 1
Definition 2: Capacity
Lemma 4.1
Lemma 4.2: el2010spectrum
Lemma 4.3
Theorem 4.4: Variance: Data-dependent regularization
Theorem 4.5: Bias under arbitrary $\lambda$
Corollary 4.5.1: Bias: $\overline{w}=w$
Theorem 4.6: Bias
proof : Proof of Lemma \ref{['lemma:decomposition']}
...and 11 more

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

TL;DR

Abstract

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (21)