Does data interpolation contradict statistical optimality?
Mikhail Belkin, Alexander Rakhlin, Alexandre B. Tsybakov
TL;DR
This work demonstrates that interpolating estimators built from singular kernels can achieve the classical minimax rates for nonparametric regression with Hölder-smooth functions, challenging the notion that interpolation must harm statistical performance. By decomposing risk into bias and variance and carefully balancing bandwidth, the authors prove finite-sample, nonparametric risk bounds of the form $E\|f_n-f\|^2_{L_2(P_X)} \le C n^{-2\beta/(2\beta+d)}$ for $\beta\in(0,2]$, with extensions to higher smoothness under density regularity. The results cover both pointwise MSE and integrated risk, and imply optimal behavior for square-loss prediction despite data interpolation. The findings offer a conceptual bridge between interpolating machine learning models, like deep networks, and classical statistical optimality, and suggest broader applicability of interpolation-based estimators in nonparametric settings.
Abstract
We show that learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.
