A Novel Approach to Regularising 1NN classifier for Improved Generalization
Aditya Challa, Sravan Danda, Laurent Najman
TL;DR
This work introduces Watershed Classifiers, a greedy, non-parametric regularization of the 1NN classifier that can learn arbitrarily complex boundaries on dense data while maintaining a small VC dimension controlled by N_SEEDS. A dedicated watershed loss is proposed to train neural representations so that propagation from seeds yields label assignments consistent with greedy 1NN labeling. Empirically, Watershed outperforms Neighbourhood Component Analysis and matches or surpasses linear classifiers across several datasets and network scales, challenging the notion that non-parametric methods cannot reach state-of-the-art performance. The approach combines a minimum spanning tree–like propagation with a non-convex, batch-sensitive loss, offering a practical framework for robust generalization in embedding learning with non-parametric classifiers.
Abstract
In this paper, we propose a class of non-parametric classifiers, that learn arbitrary boundaries and generalize well. Our approach is based on a novel way to regularize 1NN classifiers using a greedy approach. We refer to this class of classifiers as Watershed Classifiers. 1NN classifiers are known to trivially over-fit but have very large VC dimension, hence do not generalize well. We show that watershed classifiers can find arbitrary boundaries on any dense enough dataset, and, at the same time, have very small VC dimension; hence a watershed classifier leads to good generalization. Traditional approaches to regularize 1NN classifiers are to consider $K$ nearest neighbours. Neighbourhood component analysis (NCA) proposes a way to learn representations consistent with ($n-1$) nearest neighbour classifier, where $n$ denotes the size of the dataset. In this article, we propose a loss function which can learn representations consistent with watershed classifiers, and show that it outperforms the NCA baseline.
