Cluster Catch Digraphs with the Nearest Neighbor Distance
Rui Shi, Elvan Ceyhan, Nedret Billor
TL;DR
This paper introduces UN-CCDs, a parameter-free CCD-based clustering method that uses the nearest neighbor distance (NND) within a Monte Carlo Spatial Randomness Test (MC-SRT) to determine covering-ball radii. By replacing Ripley’s K-function with NND in the MC-SRT and adding enhancements such as Holm-corrected tests, descending radius exploration, and an intersection-graph refinement, UN-CCDs improve clustering quality in high-dimensional data. Extensive Monte Carlo simulations and real-data experiments show that UN-CCDs are competitive with KS-CCDs and RK-CCDs, offering especially strong performance in high dimensions while remaining robust to noise. The work highlights a practical, scalable approach for high-dimensional clustering, with clear avenues for future extensions (overlapping clusters, semi-supervised settings, and automated tuning).
Abstract
We introduce a new method for clustering based on Cluster Catch Digraphs (CCDs). The new method addresses the limitations of RK-CCDs by employing a new variant of spatial randomness test that employs the nearest neighbor distance (NND) instead of the Ripley's K function used by RK-CCDs. We conduct a comprehensive Monte Carlo analysis to assess the performance of our method, considering factors such as dimensionality, data set size, number of clusters, cluster volumes, and inter-cluster distance. Our method is particularly effective for high-dimensional data sets, comparable to or outperforming KS-CCDs and RK-CCDs that rely on a KS-type statistic or the Ripley's K function. We also evaluate our methods using real and complex data sets, comparing them to well-known clustering methods. Again, our methods exhibit competitive performance, producing high-quality clusters with desirable properties. Keywords: Graph-based clustering, Cluster catch digraphs, High-dimensional data, The nearest neighbor distance, Spatial randomness test
