Cluster Catch Digraphs with the Nearest Neighbor Distance

Rui Shi; Elvan Ceyhan; Nedret Billor

Cluster Catch Digraphs with the Nearest Neighbor Distance

Rui Shi, Elvan Ceyhan, Nedret Billor

TL;DR

This paper introduces UN-CCDs, a parameter-free CCD-based clustering method that uses the nearest neighbor distance (NND) within a Monte Carlo Spatial Randomness Test (MC-SRT) to determine covering-ball radii. By replacing Ripley’s K-function with NND in the MC-SRT and adding enhancements such as Holm-corrected tests, descending radius exploration, and an intersection-graph refinement, UN-CCDs improve clustering quality in high-dimensional data. Extensive Monte Carlo simulations and real-data experiments show that UN-CCDs are competitive with KS-CCDs and RK-CCDs, offering especially strong performance in high dimensions while remaining robust to noise. The work highlights a practical, scalable approach for high-dimensional clustering, with clear avenues for future extensions (overlapping clusters, semi-supervised settings, and automated tuning).

Abstract

We introduce a new method for clustering based on Cluster Catch Digraphs (CCDs). The new method addresses the limitations of RK-CCDs by employing a new variant of spatial randomness test that employs the nearest neighbor distance (NND) instead of the Ripley's K function used by RK-CCDs. We conduct a comprehensive Monte Carlo analysis to assess the performance of our method, considering factors such as dimensionality, data set size, number of clusters, cluster volumes, and inter-cluster distance. Our method is particularly effective for high-dimensional data sets, comparable to or outperforming KS-CCDs and RK-CCDs that rely on a KS-type statistic or the Ripley's K function. We also evaluate our methods using real and complex data sets, comparing them to well-known clustering methods. Again, our methods exhibit competitive performance, producing high-quality clusters with desirable properties. Keywords: Graph-based clustering, Cluster catch digraphs, High-dimensional data, The nearest neighbor distance, Spatial randomness test

Cluster Catch Digraphs with the Nearest Neighbor Distance

TL;DR

Abstract

Paper Structure (25 sections, 4 equations, 11 figures, 12 tables, 3 algorithms)

This paper contains 25 sections, 4 equations, 11 figures, 12 tables, 3 algorithms.

Introduction and Motivation
Preliminaries
Cluster Methods in Literature
Cluster Catch Digraphs
Cluster Catch Digraphs Based on a KS-Type Statistic
Cluster Catch Digraphs Based on Ripley's K Function
Cluster Catch Digraphs using the Nearest Neighbor Distance
The Limitations of RK-CCDs for Clustering
The MC-SRT with the Nearest Neighbor Distance
Monte Carlo Experiments for Clustering Based on UN-CCDs
Experiment with Uniform Settings
Experiment with Gaussian Settings
Experiment on Datasets with Noise
Real Data Examples
"iris" datasets:
...and 10 more sections

Figures (11)

Figure 1: A illustration of clustering with UN-CCDs. Top-left: A dataset consisting of 5 clusters generated from 5 different bivariate normal distributions. Top-right: The covering balls of an approximate MDS obtained by Greedy Algorithm \ref{['alg:greedy-outdegree-orig_digraph']}. Bottom-left: The covering balls of an approximate MDS of the intersection graph. Bottom-right: The dominating covering balls of the intersection graph that maximize silhouette index $Sil(P)$.
Figure 2: Realizations of the simulation settings with 2, 3, and 5 uniform clusters in $\mathbb{R}^2$.
Figure 3: The line plots of the ARIs of KS-CCDs, under the uniform cluster settings.
Figure 4: The line plots of the ARIs of RK-CCDs, under the uniform cluster settings.
Figure 5: The line plots of the ARIs of UN-CCDs, under the uniform cluster settings.
...and 6 more figures

Cluster Catch Digraphs with the Nearest Neighbor Distance

TL;DR

Abstract

Cluster Catch Digraphs with the Nearest Neighbor Distance

Authors

TL;DR

Abstract

Table of Contents

Figures (11)