Table of Contents
Fetching ...

Does data interpolation contradict statistical optimality?

Mikhail Belkin, Alexander Rakhlin, Alexandre B. Tsybakov

TL;DR

This work demonstrates that interpolating estimators built from singular kernels can achieve the classical minimax rates for nonparametric regression with Hölder-smooth functions, challenging the notion that interpolation must harm statistical performance. By decomposing risk into bias and variance and carefully balancing bandwidth, the authors prove finite-sample, nonparametric risk bounds of the form $E\|f_n-f\|^2_{L_2(P_X)} \le C n^{-2\beta/(2\beta+d)}$ for $\beta\in(0,2]$, with extensions to higher smoothness under density regularity. The results cover both pointwise MSE and integrated risk, and imply optimal behavior for square-loss prediction despite data interpolation. The findings offer a conceptual bridge between interpolating machine learning models, like deep networks, and classical statistical optimality, and suggest broader applicability of interpolation-based estimators in nonparametric settings.

Abstract

We show that learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.

Does data interpolation contradict statistical optimality?

TL;DR

This work demonstrates that interpolating estimators built from singular kernels can achieve the classical minimax rates for nonparametric regression with Hölder-smooth functions, challenging the notion that interpolation must harm statistical performance. By decomposing risk into bias and variance and carefully balancing bandwidth, the authors prove finite-sample, nonparametric risk bounds of the form for , with extensions to higher smoothness under density regularity. The results cover both pointwise MSE and integrated risk, and imply optimal behavior for square-loss prediction despite data interpolation. The findings offer a conceptual bridge between interpolating machine learning models, like deep networks, and classical statistical optimality, and suggest broader applicability of interpolation-based estimators in nonparametric settings.

Abstract

We show that learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.

Paper Structure

This paper contains 9 sections, 5 theorems, 46 equations, 6 figures.

Key Result

Theorem 1

Assume that $f\in\Sigma(\beta,L_f)$ for $\beta\in(0,1]$, $L_f>0$. Let Assumptions $(A1)$ and $(A2)$ be satisfied, and $0<a<d/2$. Then for any fixed $x_0\in{\mathbb R}^d$ in the support of $p$ the estimator eq:NW_def_precise with kernel eq:def_our_kernel and bandwidth $h=n^{-\frac{1}{2\beta+d}}$ sati where $C>0$ is a constant that does not depend on $n$.

Figures (6)

  • Figure 1: Interpolation with $K\left(u\right) = \left\|u\right\|^{-a} {\mathbf I}{\left\{\left\|u\right\|\leq 1\right\}}$, $a=0.49$, and various values of $h$.
  • Figure 2: Interpolation with $K\left(u\right) = \left\|u\right\|^{-a} [1-\left\|u\right\|]^2_+$, $a=0.49$, and various values of $h$.
  • Figure 3: Comparison: non-singular Epanechnikov kernel $K\left(u\right) = (3/4)(1-\left\|u\right\|^2) {\mathbf I}{\left\{\left\|u\right\|\leq 1\right\}}$.
  • Figure 4: Comparison: non-singular Gaussian kernel $K\left(u\right) = (1/\sqrt{2\pi})\exp\left\{-\left\|u\right\|^2\right\}$. Note the altered choices of $h$.
  • Figure 5: Interpolation with $K\left(u\right) = \left\|u\right\|^{-a} [1-\left\|u\right\|]^2_+$, $a=0.49$, for binary-valued $Y$.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof