Table of Contents
Fetching ...

Nearest Neighbour Equilibrium Clustering

David P. Hofmeyr

TL;DR

The paper tackles unsupervised clustering by introducing Nearest Neighbour Equilibrium Clustering (NNEC), a method that defines clusters through an equilibrium condition balancing size and cohesiveness of neighbourhoods. Clusters are grown iteratively from seeds and final assignments are determined by maximising a per-point membership strength, with automatic tuning of parameters $k$ and $\lambda$ via a normalization-based criterion. The approach is evaluated on 45 public datasets against a suite of competitive methods, showing that NNEC achieves the best average performance across AMI, ARI, and accuracy, while remaining simple and scalable. An open-source implementation is provided, highlighting the method’s practicality for automated exploratory clustering in data-rich settings.

Abstract

A novel and intuitive nearest neighbours based clustering algorithm is introduced, in which a cluster is defined in terms of an equilibrium condition which balances its size and cohesiveness. The formulation of the equilibrium condition allows for a quantification of the strength of alignment of each point to a cluster, with these cluster alignment strengths leading naturally to a model selection criterion which renders the proposed approach fully automatable. The algorithm is simple to implement and computationally efficient, and produces clustering solutions of extremely high quality in comparison with relevant benchmarks from the literature. R code to implement the approach is available from https://github.com/DavidHofmeyr/NNEC.

Nearest Neighbour Equilibrium Clustering

TL;DR

The paper tackles unsupervised clustering by introducing Nearest Neighbour Equilibrium Clustering (NNEC), a method that defines clusters through an equilibrium condition balancing size and cohesiveness of neighbourhoods. Clusters are grown iteratively from seeds and final assignments are determined by maximising a per-point membership strength, with automatic tuning of parameters and via a normalization-based criterion. The approach is evaluated on 45 public datasets against a suite of competitive methods, showing that NNEC achieves the best average performance across AMI, ARI, and accuracy, while remaining simple and scalable. An open-source implementation is provided, highlighting the method’s practicality for automated exploratory clustering in data-rich settings.

Abstract

A novel and intuitive nearest neighbours based clustering algorithm is introduced, in which a cluster is defined in terms of an equilibrium condition which balances its size and cohesiveness. The formulation of the equilibrium condition allows for a quantification of the strength of alignment of each point to a cluster, with these cluster alignment strengths leading naturally to a model selection criterion which renders the proposed approach fully automatable. The algorithm is simple to implement and computationally efficient, and produces clustering solutions of extremely high quality in comparison with relevant benchmarks from the literature. R code to implement the approach is available from https://github.com/DavidHofmeyr/NNEC.

Paper Structure

This paper contains 10 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Running of Algorithm \ref{['alg:grow']} for $k = 25$ and $\lambda = 2$, after different numbers of iterations
  • Figure 2: Equilibrium clusters for $k = 25$ and $\lambda = 2$ (these values were selected automatically using the approach described in Section \ref{['sec:tuning']}). (a)-(e) Individual equilibrium clusters; (f) final clustering solution.
  • Figure 3: Effect of inappropriate selection of $\lambda$. In each sub-figure the left plot shows the number of equilibruim clusters each point belongs to, while the right plot shows the induced clustering solution. Both inapproptiately large and small values for $\lambda$ lead to a high degree of overlap in the equilibrium clusters.
  • Figure 4: Distributions of performance ranks across all data sets.
  • Figure 5: Distributions of $[0,1]$ mapped performance across all data sets
  • ...and 1 more figures