EVINGCA: Adaptive Graph Clustering with Evolving Neighborhood Statistics
Randolph Wiredu-Aidoo
TL;DR
EVINGCA introduces an adaptive, density-variance based clustering method on a nearest-neighbor graph, replacing global density thresholds with evolving local statistics to handle heterogenous densities and complex manifolds. It combines two hierarchical filters—Level 1 for density-variance based expansion and Level 2 for per-dimension shape preservation—along with a small-cluster management policy and heuristic modulators to yield coherent, scalable clusterings. The approach demonstrates strong expressive capacity on irregular non-convex structures while remaining competitive on convex and overlapping datasets, with robustness to approximate nearest-neighbor indexing and favorable runtime scaling in high dimensions. The work provides a practical clustering framework that balances accuracy, stability, and scalability, and suggests avenues for unsupervised tuning, adaptive preprocessing, and backend optimizations to extend applicability to large-scale, real-time tasks.
Abstract
Clustering algorithms often rely on restrictive assumptions: K-Means and Gaussian Mixtures presuppose convex, Gaussian-like clusters, while DBSCAN and HDBSCAN capture non-convexity but can be highly sensitive. I introduce EVINGCA (Evolving Variance-Informed Nonparametric Graph Construction Algorithm), a density-variance based clustering algorithm that treats cluster formation as an adaptive, evolving process on a nearest-neighbor graph. EVINGCA expands rooted graphs via breadth-first search, guided by continuously updated local distance and shape statistics, replacing fixed density thresholds with local statistical feedback. With spatial indexing, EVINGCA features log-linear complexity in the average case and exhibits competitive performance against baselines across a variety of synthetic, real-world, low-d, and high-d datasets.
