Does data interpolation contradict statistical optimality?

Mikhail Belkin; Alexander Rakhlin; Alexandre B. Tsybakov

Does data interpolation contradict statistical optimality?

Mikhail Belkin, Alexander Rakhlin, Alexandre B. Tsybakov

TL;DR

This work demonstrates that interpolating estimators built from singular kernels can achieve the classical minimax rates for nonparametric regression with Hölder-smooth functions, challenging the notion that interpolation must harm statistical performance. By decomposing risk into bias and variance and carefully balancing bandwidth, the authors prove finite-sample, nonparametric risk bounds of the form $E\|f_n-f\|^2_{L_2(P_X)} \le C n^{-2\beta/(2\beta+d)}$ for $\beta\in(0,2]$, with extensions to higher smoothness under density regularity. The results cover both pointwise MSE and integrated risk, and imply optimal behavior for square-loss prediction despite data interpolation. The findings offer a conceptual bridge between interpolating machine learning models, like deep networks, and classical statistical optimality, and suggest broader applicability of interpolation-based estimators in nonparametric settings.

Abstract

We show that learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.

Does data interpolation contradict statistical optimality?

TL;DR

for

, with extensions to higher smoothness under density regularity. The results cover both pointwise MSE and integrated risk, and imply optimal behavior for square-loss prediction despite data interpolation. The findings offer a conceptual bridge between interpolating machine learning models, like deep networks, and classical statistical optimality, and suggest broader applicability of interpolation-based estimators in nonparametric settings.

Abstract

We show that learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.

Does data interpolation contradict statistical optimality?

TL;DR

Abstract

Does data interpolation contradict statistical optimality?

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)