Table of Contents
Fetching ...

Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality

Marko Medvedev, Gal Vardi, Nathan Srebro

Abstract

We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i.e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size. For fixed dimensions, we show that even with varying or tuned bandwidth, the ridgeless solution is never consistent and, at least with large enough noise, always worse than the null predictor. For increasing dimension, we give a generic characterization of the overfitting behavior for any scaling of the dimension with sample size. We use this to provide the first example of benign overfitting using the Gaussian kernel with sub-polynomial scaling dimension. All our results are under the Gaussian universality ansatz and the (non-rigorous) risk predictions in terms of the kernel eigenstructure.

Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality

Abstract

We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i.e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size. For fixed dimensions, we show that even with varying or tuned bandwidth, the ridgeless solution is never consistent and, at least with large enough noise, always worse than the null predictor. For increasing dimension, we give a generic characterization of the overfitting behavior for any scaling of the dimension with sample size. We use this to provide the first example of benign overfitting using the Gaussian kernel with sub-polynomial scaling dimension. All our results are under the Gaussian universality ansatz and the (non-rigorous) risk predictions in terms of the kernel eigenstructure.
Paper Structure (46 sections, 21 theorems, 172 equations, 1 figure)

This paper contains 46 sections, 21 theorems, 172 equations, 1 figure.

Key Result

Theorem 3

Under targetassumption, the following bounds hold for the predicted risk $\tilde{R}(\hat{f}_0)$ of the minimum norm interpolating solution of Gaussian KRR:

Figures (1)

  • Figure 1: Using the setup of \ref{['gaussrisklowerbound']} in the paper, we plot the dependence of the test error on the sample size for the Gaussian kernel ridgeless predictor. We consider $y= f^*(x)+\xi$ where $\xi \sim N(0,\sigma^2)$, $f^* = 10$, $\sigma^2$ is the noise level, and $x \sim \text{Unif}(S^{d-1})$. We compare the test error with the noise level (the Bayes risk) and the risk of the null predictor. We see that for all three cases, our predictions agree with the experiments.

Theorems & Definitions (35)

  • Theorem 3: Overfitting behavior of Gaussian kernel in fixed dimension
  • Definition 5: Lower and upper index
  • Theorem 7: Test risk upper bound for kernel ridgeless regression
  • Theorem 9: Test risk lower bound for any kernel ridgeless regression
  • Corollary 11: Dot-product kernels with polynomially increasing dimension, recovering the results of ghorbani2021linearizedmei2022misiakiewicz2022spectrumbarzilai2024zhang2024
  • Corollary 12: Inconsistency with dot-product kernels in logarithmically scaling dimension
  • Corollary 13: Benign overfitting with Gaussian kernel and sub-polynomial dimension
  • Remark 14: Allowed target functions
  • Definition 15: Cost of overfitting
  • Proposition 16: Necessary and sufficient condition for benign overfitting zhou2023
  • ...and 25 more