Table of Contents
Fetching ...

On high-dimensional modifications of the nearest neighbor classifier

Annesha Ghosh, Deep Ghoshal, Bilol Banerjee, Anil K. Ghosh

TL;DR

This article discusses some existing nonparametric classifier methods and proposes some new ones, and carries out some theoretical investigations and analyzes several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.

Abstract

Nearest neighbor classifier is arguably the most simple and popular nonparametric classifier available in the literature. However, due to the concentration of pairwise distances and the violation of the neighborhood structure, this classifier often suffers in high-dimension, low-sample size (HDLSS) situations, especially when the scale difference between the competing classes dominates their location difference. Several attempts have been made in the literature to take care of this problem. In this article, we discuss some of these existing methods and propose some new ones. We carry out some theoretical investigations in this regard and analyze several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.

On high-dimensional modifications of the nearest neighbor classifier

TL;DR

This article discusses some existing nonparametric classifier methods and proposes some new ones, and carries out some theoretical investigations and analyzes several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.

Abstract

Nearest neighbor classifier is arguably the most simple and popular nonparametric classifier available in the literature. However, due to the concentration of pairwise distances and the violation of the neighborhood structure, this classifier often suffers in high-dimension, low-sample size (HDLSS) situations, especially when the scale difference between the competing classes dominates their location difference. Several attempts have been made in the literature to take care of this problem. In this article, we discuss some of these existing methods and propose some new ones. We carry out some theoretical investigations in this regard and analyze several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.
Paper Structure (7 sections, 3 theorems, 12 equations, 12 figures, 1 table)

This paper contains 7 sections, 3 theorems, 12 equations, 12 figures, 1 table.

Key Result

Theorem 1

If $J$ competing classes satisfy assumptions (A1)-(A3), and there are at least two observations from each of them (i.e, $n_j\ge 2$ for all $j=1,2,\ldots, J$), then we have the following results.

Figures (12)

  • Figure 1: Misclassification rates of Bayes, NN, CH and MCH classifiers in Examples 1-3.
  • Figure 2: Misclassification rates of Bayes, NN, CH, and MCH classifiers in Examples 4-6.
  • Figure 3: Scatter plots of training (top row) and test (bottom row) samples along with the class boundaries estimated by NN, CH, MCH, and MDist classifiers in Example 4.
  • Figure 4: Scatter plots of the test samples and the class boundaries estimated by NN, CH, MCH and MDist classifiers in Example 5.
  • Figure 5: Scatter plots of the test samples and the class boundaries estimated by NN, CH, MCH and MDist classifiers in Example 6.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Theorem 3