Fast Computation of Leave-One-Out Cross-Validation for $k$-NN Regression
Motonobu Kanagawa
TL;DR
This work addresses the high computational cost of leave-one-out cross-validation (LOOCV) for $k$-NN regression by deriving a exact fast LOOCV formula. Under a tie-breaking assumption, LOOCV for a given $k$ equals the mean squared error of $(k+1)$-NN regression on the full training set, scaled by $\left(\frac{k+1}{k}\right)^2$, enabling a single $(k+1)$-NN fit to compute LOOCV for all $k$. The key contribution is the corollary ${\rm LOOCV}(k,D_n) = \left(\frac{k+1}{k}\right)^2 \frac{1}{n} \sum_{\ell=1}^n (\hat f_{k+1,D_n}(x_\ell) - y_\ell)^2$, along with empirical validation on real datasets and a discussion of the tie-breaking condition. This fast LOOCV computation facilitates rapid hyperparameter tuning and opens avenues for optimizing distance metrics in addition to $k$.
Abstract
We describe a fast computation method for leave-one-out cross-validation (LOOCV) for $k$-nearest neighbours ($k$-NN) regression. We show that, under a tie-breaking condition for nearest neighbours, the LOOCV estimate of the mean square error for $k$-NN regression is identical to the mean square error of $(k+1)$-NN regression evaluated on the training data, multiplied by the scaling factor $(k+1)^2/k^2$. Therefore, to compute the LOOCV score, one only needs to fit $(k+1)$-NN regression only once, and does not need to repeat training-validation of $k$-NN regression for the number of training data. Numerical experiments confirm the validity of the fast computation method.
