Table of Contents
Fetching ...

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Yihang Chen, Fanghui Liu, Taiji Suzuki, Volkan Cevher

TL;DR

This paper investigates high-dimensional kernel ridge regression under covariate shift and analyzes the role of importance weighting as a data-dependent implicit regularization. By deriving an asymptotic expansion of kernels and a bias-variance decomposition, it shows that weightings can reduce variance through spectral-view regularization while bias depends on the chosen regularization scale. The results separate intrinsic covariate-shift bias from re-weighting bias and establish that well-chosen regularization can drive the re-weighting bias to zero, with variance controlled by the spectral decay of the data-dependent kernel. Collectively, these findings offer theoretical guidance for deploying importance weighting in nonparametric, high-capacity settings where covariate shift occurs.

Abstract

This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales. In our analysis, the bias and variance can be characterized by the spectral decay of a data-dependent regularized kernel: the original kernel matrix associated with an additional re-weighting matrix, and thus the re-weighting strategy can be regarded as a data-dependent regularization for better understanding. Besides, our analysis provides asymptotic expansion of kernel functions/vectors under covariate shift, which has its own interest.

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

TL;DR

This paper investigates high-dimensional kernel ridge regression under covariate shift and analyzes the role of importance weighting as a data-dependent implicit regularization. By deriving an asymptotic expansion of kernels and a bias-variance decomposition, it shows that weightings can reduce variance through spectral-view regularization while bias depends on the chosen regularization scale. The results separate intrinsic covariate-shift bias from re-weighting bias and establish that well-chosen regularization can drive the re-weighting bias to zero, with variance controlled by the spectral decay of the data-dependent kernel. Collectively, these findings offer theoretical guidance for deploying importance weighting in nonparametric, high-capacity settings where covariate shift occurs.

Abstract

This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales. In our analysis, the bias and variance can be characterized by the spectral decay of a data-dependent regularized kernel: the original kernel matrix associated with an additional re-weighting matrix, and thus the re-weighting strategy can be regarded as a data-dependent regularization for better understanding. Besides, our analysis provides asymptotic expansion of kernel functions/vectors under covariate shift, which has its own interest.
Paper Structure (39 sections, 12 theorems, 108 equations, 1 figure, 2 tables)

This paper contains 39 sections, 12 theorems, 108 equations, 1 figure, 2 tables.

Key Result

Lemma 4.1

We consider the excess risk $\|\overline{f}_{\lambda,\bm{Z}}-{f}_\rho\|_q$ conditioned on $\bm{X}$ for our re-weighting estimator eq:iw_emp_risk_Z, admitting the following bias-variance decomposition:

Figures (1)

  • Figure 1: We plot the empirical excess error, variance, bias and the scaled theoretical upper bound scaled V and scaled B under different decays with $\lambda\propto n^{-1/2}$.

Theorems & Definitions (21)

  • Definition 1
  • Definition 2: Capacity
  • Lemma 4.1
  • Lemma 4.2: el2010spectrum
  • Lemma 4.3
  • Theorem 4.4: Variance: Data-dependent regularization
  • Theorem 4.5: Bias under arbitrary $\lambda$
  • Corollary 4.5.1: Bias: $\overline{w}=w$
  • Theorem 4.6: Bias
  • proof : Proof of Lemma \ref{['lemma:decomposition']}
  • ...and 11 more