Table of Contents
Fetching ...

A Novel Approach to Regularising 1NN classifier for Improved Generalization

Aditya Challa, Sravan Danda, Laurent Najman

TL;DR

This work introduces Watershed Classifiers, a greedy, non-parametric regularization of the 1NN classifier that can learn arbitrarily complex boundaries on dense data while maintaining a small VC dimension controlled by N_SEEDS. A dedicated watershed loss is proposed to train neural representations so that propagation from seeds yields label assignments consistent with greedy 1NN labeling. Empirically, Watershed outperforms Neighbourhood Component Analysis and matches or surpasses linear classifiers across several datasets and network scales, challenging the notion that non-parametric methods cannot reach state-of-the-art performance. The approach combines a minimum spanning tree–like propagation with a non-convex, batch-sensitive loss, offering a practical framework for robust generalization in embedding learning with non-parametric classifiers.

Abstract

In this paper, we propose a class of non-parametric classifiers, that learn arbitrary boundaries and generalize well. Our approach is based on a novel way to regularize 1NN classifiers using a greedy approach. We refer to this class of classifiers as Watershed Classifiers. 1NN classifiers are known to trivially over-fit but have very large VC dimension, hence do not generalize well. We show that watershed classifiers can find arbitrary boundaries on any dense enough dataset, and, at the same time, have very small VC dimension; hence a watershed classifier leads to good generalization. Traditional approaches to regularize 1NN classifiers are to consider $K$ nearest neighbours. Neighbourhood component analysis (NCA) proposes a way to learn representations consistent with ($n-1$) nearest neighbour classifier, where $n$ denotes the size of the dataset. In this article, we propose a loss function which can learn representations consistent with watershed classifiers, and show that it outperforms the NCA baseline.

A Novel Approach to Regularising 1NN classifier for Improved Generalization

TL;DR

This work introduces Watershed Classifiers, a greedy, non-parametric regularization of the 1NN classifier that can learn arbitrarily complex boundaries on dense data while maintaining a small VC dimension controlled by N_SEEDS. A dedicated watershed loss is proposed to train neural representations so that propagation from seeds yields label assignments consistent with greedy 1NN labeling. Empirically, Watershed outperforms Neighbourhood Component Analysis and matches or surpasses linear classifiers across several datasets and network scales, challenging the notion that non-parametric methods cannot reach state-of-the-art performance. The approach combines a minimum spanning tree–like propagation with a non-convex, batch-sensitive loss, offering a practical framework for robust generalization in embedding learning with non-parametric classifiers.

Abstract

In this paper, we propose a class of non-parametric classifiers, that learn arbitrary boundaries and generalize well. Our approach is based on a novel way to regularize 1NN classifiers using a greedy approach. We refer to this class of classifiers as Watershed Classifiers. 1NN classifiers are known to trivially over-fit but have very large VC dimension, hence do not generalize well. We show that watershed classifiers can find arbitrary boundaries on any dense enough dataset, and, at the same time, have very small VC dimension; hence a watershed classifier leads to good generalization. Traditional approaches to regularize 1NN classifiers are to consider nearest neighbours. Neighbourhood component analysis (NCA) proposes a way to learn representations consistent with () nearest neighbour classifier, where denotes the size of the dataset. In this article, we propose a loss function which can learn representations consistent with watershed classifiers, and show that it outperforms the NCA baseline.
Paper Structure (46 sections, 1 theorem, 5 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 46 sections, 1 theorem, 5 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Let $\{{\bm{x}}_i\}$ denotes the set of data points and ${\mathcal{G}}_{\mathcal{D}} = (V,E,W)$ denotes the complete graph. Assume that $W({\bm{x}}_i, {\bm{x}}_j) \neq W({\bm{x}}_k, {\bm{x}}_l)$ for all $i\neq k$ or $j\neq l$. That is, all edge weights are assumed to be distinct. Let the number of c

Figures (6)

  • Figure 1: Illustrating the labelling preference using watershed classifier. (a) illustrates an arbitrary set of data points. Blue dots indicate class 0, Red dots indicate class 1. The unlabelled points have no colour. Few selected edges with corresponding edge-weights are included. (b) and (c) indicates two different labelling. Observe that Margin of (b) is $0.5$, while margin of (c) is $1$. Hence $(c)$ is considered a better labelling than (b) as per the Maximum Margin Principle.
  • Figure 2: Comparison of Watershed Classifier with Linear Classifier and Decision Tree. Considering a simple toy example, and a fixed VC-dimension of $3$, observe that watershed classifier can find the right boundary.
  • Figure 3: Illustrating the computation of watershed loss. (a) shows the toy representations obtained from $f_{\theta}$ along with the ground-truth labels. (b) shows the seeds selected from each class. We assume $\texttt{N\_SEEDS}{}=1$ in this case. (c) illustrates the propagation of the labels. Note that $x_4$ has a ground-truth label of red, but is labelled blue with label propagation. All the other labels match the ground-truth. (d) identifies the correctly labelled samples. (e) identifies closest samples which are correctly labelled. In this case, for $x_4$, the closest blue sample is $x_1$ and the closest red sample is $x_3$. Note that, one should compute this for all samples. We only choose $x_4$ here for illustration. Finally, we compute the loss for each sample using \ref{['eq:loss1']} and \ref{['eq:loss2']}.
  • Figure 4: Representation Capacity of Watershed Classifier.
  • Figure 5: Illustrating that the loss function in \ref{['sec:train_nn_watershed']} indeed is consistent with greedy 1NN propagation. The purple dots indicate class $1$ and the yellow dots indicate class $0$. Observe that as training progresses we have that number of cross-edges in the minimum spanning tree reduces. Note that greedy 1NN propagation is similar to the Prim's algorithm for constructing a minimum spanning tree. And hence the number of cross-edges provide a good measure on the efficacy of propagation.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Proof 1