Table of Contents
Fetching ...

Optimal Extended Neighbourhood Rule $k$ Nearest Neighbours Ensemble

Amjad Ali, Zardad Khan, Dost Muhammad Khan, Saeed Aldahmani

TL;DR

The paper tackles the limitations of traditional $k$NN when test points fall outside the conventional neighbourhood and when ensembles are destabilized by non-informative features. It introduces the Optimal Extended Neighbourhood Rule (OExNRule) ensemble, which builds numerous base ExNRule models on bootstrap samples with random feature subspaces ($p'\le p$), ranks them by out-of-bag error, and aggregates the top performers via majority voting. The extended neighbourhood mechanism sequentially selects the $k$ nearest observations using a $q$-norm-based distance delta, and each base model contributes to a robust ensemble selected by out-of-bag performance. Empirical results on 17 benchmarks, including datasets with contrived features, show that OExNRule often outperforms classical $k$NN variants, RF, OTE, and SVM in accuracy and calibration (lower Brier score and higher Cohen's $\kappa$), highlighting its practical value for robust, well-calibrated classification. The work also points to potential improvements in distance weighting and feature selection to further enhance performance.

Abstract

The traditional k nearest neighbor (kNN) approach uses a distance formula within a spherical region to determine the k closest training observations to a test sample point. However, this approach may not work well when test point is located outside this region. Moreover, aggregating many base kNN learners can result in poor ensemble performance due to high classification errors. To address these issues, a new optimal extended neighborhood rule based ensemble method is proposed in this paper. This rule determines neighbors in k steps starting from the closest sample point to the unseen observation and selecting subsequent nearest data points until the required number of observations is reached. Each base model is constructed on a bootstrap sample with a random subset of features, and optimal models are selected based on out-of-bag performance after building a sufficient number of models. The proposed ensemble is compared with state-of-the-art methods on 17 benchmark datasets using accuracy, Cohen's kappa, and Brier score (BS). The performance of the proposed method is also assessed by adding contrived features in the original data.

Optimal Extended Neighbourhood Rule $k$ Nearest Neighbours Ensemble

TL;DR

The paper tackles the limitations of traditional NN when test points fall outside the conventional neighbourhood and when ensembles are destabilized by non-informative features. It introduces the Optimal Extended Neighbourhood Rule (OExNRule) ensemble, which builds numerous base ExNRule models on bootstrap samples with random feature subspaces (), ranks them by out-of-bag error, and aggregates the top performers via majority voting. The extended neighbourhood mechanism sequentially selects the nearest observations using a -norm-based distance delta, and each base model contributes to a robust ensemble selected by out-of-bag performance. Empirical results on 17 benchmarks, including datasets with contrived features, show that OExNRule often outperforms classical NN variants, RF, OTE, and SVM in accuracy and calibration (lower Brier score and higher Cohen's ), highlighting its practical value for robust, well-calibrated classification. The work also points to potential improvements in distance weighting and feature selection to further enhance performance.

Abstract

The traditional k nearest neighbor (kNN) approach uses a distance formula within a spherical region to determine the k closest training observations to a test sample point. However, this approach may not work well when test point is located outside this region. Moreover, aggregating many base kNN learners can result in poor ensemble performance due to high classification errors. To address these issues, a new optimal extended neighborhood rule based ensemble method is proposed in this paper. This rule determines neighbors in k steps starting from the closest sample point to the unseen observation and selecting subsequent nearest data points until the required number of observations is reached. Each base model is constructed on a bootstrap sample with a random subset of features, and optimal models are selected based on out-of-bag performance after building a sufficient number of models. The proposed ensemble is compared with state-of-the-art methods on 17 benchmark datasets using accuracy, Cohen's kappa, and Brier score (BS). The performance of the proposed method is also assessed by adding contrived features in the original data.
Paper Structure (12 sections, 3 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 12 sections, 3 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Flow chart of the proposed optimal extended neighbourhood rule (OExNRule) ensemble.
  • Figure 2: Classification accuracy of the OExNRule and the other state-of-the-art procedures on the benchmark datasets in their original form.
  • Figure 3: Cohen's kappa of the OExNRule and the other state-of-the-art procedures on the benchmark datasets in their original form.
  • Figure 4: Brier score (BS) of the OExNRule and the other state-of-the-art procedures on the benchmark datasets in their original form.