Hierarchical clustering that takes advantage of both density-peak and density-connectivity
Ye Zhu, Kai Ming Ting, Yuan Jin, Maia Angelova
TL;DR
This work formalizes two cluster notions, $η$-linked and $η$-density-connected$ clusters, to analyze and extend Density Peak (DP) clustering. It shows DP targets $η$-linked clusters but has two fundamental weaknesses, which are not resolved by Local Contrast; to address this, the authors introduce DC-HDP, a density-connected hierarchical clustering that merges cluster modes only when they are connected by an $η$-density-connected path, preserving DP's efficiency while enabling arbitrary shapes and highly varied densities. DC-HDP yields a dendrogram, providing richer hierarchical cluster information and a principled way to extract flat clusters at desired granularity. Empirically, DC-HDP outperforms a broad set of state-of-the-art clustering algorithms (density-based, hierarchical, and graph-based) across 28 datasets, with a macro F-measure average of 0.82 and competitive runtimes. The approach offers a rigorous foundation for hierarchical density-based clustering and practical gains in cluster discovery and interpretation.
Abstract
This paper focuses on density-based clustering, particularly the Density Peak (DP) algorithm and the one based on density-connectivity DBSCAN; and proposes a new method which takes advantage of the individual strengths of these two methods to yield a density-based hierarchical clustering algorithm. Our investigation begins with formally defining the types of clusters DP and DBSCAN are designed to detect; and then identifies the kinds of distributions that DP and DBSCAN individually fail to detect all clusters in a dataset. These identified weaknesses inspire us to formally define a new kind of clusters and propose a new method called DC-HDP to overcome these weaknesses to identify clusters with arbitrary shapes and varied densities. In addition, the new method produces a richer clustering result in terms of hierarchy or dendrogram for better cluster structures understanding. Our empirical evaluation results show that DC-HDP produces the best clustering results on 14 datasets in comparison with 7 state-of-the-art clustering algorithms.
