Table of Contents
Fetching ...

EVINGCA: Adaptive Graph Clustering with Evolving Neighborhood Statistics

Randolph Wiredu-Aidoo

TL;DR

EVINGCA introduces an adaptive, density-variance based clustering method on a nearest-neighbor graph, replacing global density thresholds with evolving local statistics to handle heterogenous densities and complex manifolds. It combines two hierarchical filters—Level 1 for density-variance based expansion and Level 2 for per-dimension shape preservation—along with a small-cluster management policy and heuristic modulators to yield coherent, scalable clusterings. The approach demonstrates strong expressive capacity on irregular non-convex structures while remaining competitive on convex and overlapping datasets, with robustness to approximate nearest-neighbor indexing and favorable runtime scaling in high dimensions. The work provides a practical clustering framework that balances accuracy, stability, and scalability, and suggests avenues for unsupervised tuning, adaptive preprocessing, and backend optimizations to extend applicability to large-scale, real-time tasks.

Abstract

Clustering algorithms often rely on restrictive assumptions: K-Means and Gaussian Mixtures presuppose convex, Gaussian-like clusters, while DBSCAN and HDBSCAN capture non-convexity but can be highly sensitive. I introduce EVINGCA (Evolving Variance-Informed Nonparametric Graph Construction Algorithm), a density-variance based clustering algorithm that treats cluster formation as an adaptive, evolving process on a nearest-neighbor graph. EVINGCA expands rooted graphs via breadth-first search, guided by continuously updated local distance and shape statistics, replacing fixed density thresholds with local statistical feedback. With spatial indexing, EVINGCA features log-linear complexity in the average case and exhibits competitive performance against baselines across a variety of synthetic, real-world, low-d, and high-d datasets.

EVINGCA: Adaptive Graph Clustering with Evolving Neighborhood Statistics

TL;DR

EVINGCA introduces an adaptive, density-variance based clustering method on a nearest-neighbor graph, replacing global density thresholds with evolving local statistics to handle heterogenous densities and complex manifolds. It combines two hierarchical filters—Level 1 for density-variance based expansion and Level 2 for per-dimension shape preservation—along with a small-cluster management policy and heuristic modulators to yield coherent, scalable clusterings. The approach demonstrates strong expressive capacity on irregular non-convex structures while remaining competitive on convex and overlapping datasets, with robustness to approximate nearest-neighbor indexing and favorable runtime scaling in high dimensions. The work provides a practical clustering framework that balances accuracy, stability, and scalability, and suggests avenues for unsupervised tuning, adaptive preprocessing, and backend optimizations to extend applicability to large-scale, real-time tasks.

Abstract

Clustering algorithms often rely on restrictive assumptions: K-Means and Gaussian Mixtures presuppose convex, Gaussian-like clusters, while DBSCAN and HDBSCAN capture non-convexity but can be highly sensitive. I introduce EVINGCA (Evolving Variance-Informed Nonparametric Graph Construction Algorithm), a density-variance based clustering algorithm that treats cluster formation as an adaptive, evolving process on a nearest-neighbor graph. EVINGCA expands rooted graphs via breadth-first search, guided by continuously updated local distance and shape statistics, replacing fixed density thresholds with local statistical feedback. With spatial indexing, EVINGCA features log-linear complexity in the average case and exhibits competitive performance against baselines across a variety of synthetic, real-world, low-d, and high-d datasets.

Paper Structure

This paper contains 39 sections, 16 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Demonstration of the influence of expansion on a single Gaussian cluster. As expansion increases, the central cluster grows in size.
  • Figure 2: Demonstration of L1 vs L2 on a rectangular point set. L1 fuses all points into one cluster while L2 preserves the linear shape of each side of the rectangle.
  • Figure 4: A 3D plot of a cluster in the Fish dataset. This cluster, similar to others, is composed of flattened x-y sheets, biasing EVINGCA towards over-segmentation.
  • Figure 5: Demonstration of the interactive effects of EVINGCA's parameters on Labirynth. Tuning individual parameters can be sensitive, but binary-search-like adjustments combined with small cluster clean-up via min_cluster_size (mcs) can create desired clusters. The final configuration (bottom right) achieves an ARI of 0.999 with ground truth.
  • Figure 6: Average anytime performance of EVINGCA across non-development datasets. The curve shows the mean best ARI as a function of parameter trials. EVINGCA improves rapidly, crossing 0.7 before starting to plateau (remaining improvement $\le$ 0.05 ARI units) near that level.