Clustering by Mining Density Distributions and Splitting Manifold Structure
Zhichang Xu, Zhiguo Long, Hua Meng
TL;DR
This paper tackles the scalability and robustness of spectral clustering on structurally complex data by replacing granular-ball micro-clusters with locally derived pseudo-clusters built from density estimates. It introduces a manifold-curvature guided splitting rule to produce more convex sub-clusters, enabling a simple Euclidean-based similarity between pseudo-clusters and reducing reliance on global shapes. The approach, MDMSC, combines three stages—constructing pseudo-clusters via density peaks, curvature-based splitting, and spectral clustering on pseudo-clusters—achieving superior adaptability and accuracy across synthetic and real datasets, with favorable running times in many cases. The work demonstrates that incorporating local structure and manifold-aware splitting yields practical improvements for large-scale clustering tasks and offers avenues for parallelization and adaptive hyperparameter tuning.
Abstract
Spectral clustering requires the time-consuming decomposition of the Laplacian matrix of the similarity graph, thus limiting its applicability to large datasets. To improve the efficiency of spectral clustering, a top-down approach was recently proposed, which first divides the data into several micro-clusters (granular-balls), then splits these micro-clusters when they are not ``compact'', and finally uses these micro-clusters as nodes to construct a similarity graph for more efficient spectral clustering. However, this top-down approach is challenging to adapt to unevenly distributed or structurally complex data. This is because constructing micro-clusters as a rough ball struggles to capture the shape and structure of data in a local range, and the simplistic splitting rule that solely targets ``compactness'' is susceptible to noise and variations in data density and leads to micro-clusters with varying shapes, making it challenging to accurately measure the similarity between them. To resolve these issues and improve spectral clustering, this paper first proposes to start from local structures to obtain micro-clusters, such that the complex structural information inside local neighborhoods is well captured by them. Moreover, by noting that Euclidean distance is more suitable for convex sets, this paper further proposes a data splitting rule that couples local density and data manifold structures, so that the similarities of the obtained micro-clusters can be easily characterized. A novel similarity measure between micro-clusters is then proposed for the final spectral clustering. A series of experiments based on synthetic and real-world datasets demonstrate that the proposed method has better adaptability to structurally complex data than granular-ball based methods.
