Computational Efficiency under Covariate Shift in Kernel Ridge Regression
Andrea Della Vecchia, Arnaud Mavakala Watusadisi, Ernesto De Vito, Lorenzo Rosasco
TL;DR
The paper tackles covariate shift in RKHS-based nonparametric regression by combining importance weighting with Nyström-based random subspaces to dramatically improve computational scalability. It develops a theoretically principled framework with high-probability excess risk bounds that hold even when IW weights may be unbounded, and shows that carefully chosen subspace sizes and regularization recover the best-known rates for kernel ridge regression under covariate shift. The main contributions include (i) deriving novel risk bounds for Nyström approximations under distribution mismatch, (ii) extending the analysis to unknown weights via a weight-vs-proxy decomposition, and (iii) validating the approach with simulations and real-data benchmarks where W-Nyström KRR achieves the accuracy of full weighted KRR with substantial time and memory savings. The results demonstrate that random projection techniques can yield scalable, statistically optimal kernel methods in covariate-shift settings, enabling practical deployment on large datasets.
Abstract
This paper addresses the covariate shift problem in the context of nonparametric regression within reproducing kernel Hilbert spaces (RKHSs). Covariate shift arises in supervised learning when the input distributions of the training and test data differ, presenting additional challenges for learning. Although kernel methods have optimal statistical properties, their high computational demands in terms of time and, particularly, memory, limit their scalability to large datasets. To address this limitation, the main focus of this paper is to explore the trade-off between computational efficiency and statistical accuracy under covariate shift. We investigate the use of random projections where the hypothesis space consists of a random subspace within a given RKHS. Our results show that, even in the presence of covariate shift, significant computational savings can be achieved without compromising learning performance.
