Table of Contents
Fetching ...

Bags of Projected Nearest Neighbours: Competitors to Random Forests?

David P. Hofmeyr

TL;DR

The work addresses the instability gap in bagging for kNN by introducing Bag Of Projected Nearest Neighbours (BOPNN), which learns discriminant subspaces per bootstrap to adaptively steer nearest-neighbour decisions. The method combines a simple discriminant subspace, computed from $\hat{\Sigma}_{in}^{-1}\hat{\Sigma}_{out}$, with randomised subspace selection to enhance diversity and efficiency, enabling effective bagging of $k$NN classifiers. Empirically, BOPNN achieves performance on par with Random Forests across 162 datasets, with SVM often leading and ES$k$NN lagging; variable importance and visualizations are natural byproducts of the subspace framework. The approach offers a practical, scalable alternative to RFs that preserves interpretability and provides actionable diagnostics, while highlighting avenues for improving handling of categoricals and imbalance in future work.

Abstract

In this paper we introduce a simple and intuitive adaptive k nearest neighbours classifier, and explore its utility within the context of bootstrap aggregating ("bagging"). The approach is based on finding discriminant subspaces which are computationally efficient to compute, and are motivated by enhancing the discrimination of classes through nearest neighbour classifiers. This adaptiveness promotes diversity of the individual classifiers fit across different bootstrap samples, and so further leverages the variance reducing effect of bagging. Extensive experimental results are presented documenting the strong performance of the proposed approach in comparison with Random Forest classifiers, as well as other nearest neighbours based ensembles from the literature, plus other relevant benchmarks. Code to implement the proposed approach is available in the form of an R package from https://github.com/DavidHofmeyr/BOPNN.

Bags of Projected Nearest Neighbours: Competitors to Random Forests?

TL;DR

The work addresses the instability gap in bagging for kNN by introducing Bag Of Projected Nearest Neighbours (BOPNN), which learns discriminant subspaces per bootstrap to adaptively steer nearest-neighbour decisions. The method combines a simple discriminant subspace, computed from , with randomised subspace selection to enhance diversity and efficiency, enabling effective bagging of NN classifiers. Empirically, BOPNN achieves performance on par with Random Forests across 162 datasets, with SVM often leading and ESNN lagging; variable importance and visualizations are natural byproducts of the subspace framework. The approach offers a practical, scalable alternative to RFs that preserves interpretability and provides actionable diagnostics, while highlighting avenues for improving handling of categoricals and imbalance in future work.

Abstract

In this paper we introduce a simple and intuitive adaptive k nearest neighbours classifier, and explore its utility within the context of bootstrap aggregating ("bagging"). The approach is based on finding discriminant subspaces which are computationally efficient to compute, and are motivated by enhancing the discrimination of classes through nearest neighbour classifiers. This adaptiveness promotes diversity of the individual classifiers fit across different bootstrap samples, and so further leverages the variance reducing effect of bagging. Extensive experimental results are presented documenting the strong performance of the proposed approach in comparison with Random Forest classifiers, as well as other nearest neighbours based ensembles from the literature, plus other relevant benchmarks. Code to implement the proposed approach is available in the form of an R package from https://github.com/DavidHofmeyr/BOPNN.

Paper Structure

This paper contains 18 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: 2-Dimensional projections of the pen-based recognition of handwritten digits and image segmentation data sets.
  • Figure 2: Boxplots of accuracy distributions for different classification models, using two different standardisations.
  • Figure 3: Standardised classification accuracy across all data sets
  • Figure 4: Correlations between studentised accuracy and data set characteristics. Left: marginal correlations, Right: OLS coefficients

Theorems & Definitions (3)

  • Remark 1
  • Remark 2
  • Remark 3