Table of Contents
Fetching ...

Block-Diagonal Guided DBSCAN Clustering

Weibing Zhao

TL;DR

DBSCAN often struggles with high-dimensional, large-scale data and is sensitive to the neighborhood parameter $\epsilon$ and density threshold $\delta$. This work introduces BD-DBSCAN, which builds a similarity graph that can be permuted to a block-diagonal form and then identifies clustering structure by grouping diagonal blocks. It advances the pipeline with a gradient-descent-based permutation routine, a DBSCAN-based points traversal that yields an augmented cluster ordering, and a split-and-refine diagonal-block search with theoretical guarantees. The method offers robustness to density variation, scalability to large datasets, and intuitive visualization of the clustering process. Empirical evaluation on twelve real-world benchmarks shows consistent superiority over state-of-the-art methods.

Abstract

Cluster analysis plays a crucial role in database mining, and one of the most widely used algorithms in this field is DBSCAN. However, DBSCAN has several limitations, such as difficulty in handling high-dimensional large-scale data, sensitivity to input parameters, and lack of robustness in producing clustering results. This paper introduces an improved version of DBSCAN that leverages the block-diagonal property of the similarity graph to guide the clustering procedure of DBSCAN. The key idea is to construct a graph that measures the similarity between high-dimensional large-scale data points and has the potential to be transformed into a block-diagonal form through an unknown permutation, followed by a cluster-ordering procedure to generate the desired permutation. The clustering structure can be easily determined by identifying the diagonal blocks in the permuted graph. We propose a gradient descent-based method to solve the proposed problem. Additionally, we develop a DBSCAN-based points traversal algorithm that identifies clusters with high densities in the graph and generates an augmented ordering of clusters. The block-diagonal structure of the graph is then achieved through permutation based on the traversal order, providing a flexible foundation for both automatic and interactive cluster analysis. We introduce a split-and-refine algorithm to automatically search for all diagonal blocks in the permuted graph with theoretically optimal guarantees under specific cases. We extensively evaluate our proposed approach on twelve challenging real-world benchmark clustering datasets and demonstrate its superior performance compared to the state-of-the-art clustering method on every dataset.

Block-Diagonal Guided DBSCAN Clustering

TL;DR

DBSCAN often struggles with high-dimensional, large-scale data and is sensitive to the neighborhood parameter and density threshold . This work introduces BD-DBSCAN, which builds a similarity graph that can be permuted to a block-diagonal form and then identifies clustering structure by grouping diagonal blocks. It advances the pipeline with a gradient-descent-based permutation routine, a DBSCAN-based points traversal that yields an augmented cluster ordering, and a split-and-refine diagonal-block search with theoretical guarantees. The method offers robustness to density variation, scalability to large datasets, and intuitive visualization of the clustering process. Empirical evaluation on twelve real-world benchmarks shows consistent superiority over state-of-the-art methods.

Abstract

Cluster analysis plays a crucial role in database mining, and one of the most widely used algorithms in this field is DBSCAN. However, DBSCAN has several limitations, such as difficulty in handling high-dimensional large-scale data, sensitivity to input parameters, and lack of robustness in producing clustering results. This paper introduces an improved version of DBSCAN that leverages the block-diagonal property of the similarity graph to guide the clustering procedure of DBSCAN. The key idea is to construct a graph that measures the similarity between high-dimensional large-scale data points and has the potential to be transformed into a block-diagonal form through an unknown permutation, followed by a cluster-ordering procedure to generate the desired permutation. The clustering structure can be easily determined by identifying the diagonal blocks in the permuted graph. We propose a gradient descent-based method to solve the proposed problem. Additionally, we develop a DBSCAN-based points traversal algorithm that identifies clusters with high densities in the graph and generates an augmented ordering of clusters. The block-diagonal structure of the graph is then achieved through permutation based on the traversal order, providing a flexible foundation for both automatic and interactive cluster analysis. We introduce a split-and-refine algorithm to automatically search for all diagonal blocks in the permuted graph with theoretically optimal guarantees under specific cases. We extensively evaluate our proposed approach on twelve challenging real-world benchmark clustering datasets and demonstrate its superior performance compared to the state-of-the-art clustering method on every dataset.
Paper Structure (6 sections, 2 figures, 1 table)

This paper contains 6 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The framework of the proposed BD-DBSCAN clustering method, including graph construction, permutation, and segmentation. The input is 60 images from Extended Yale-B georghiades2001few dataset. The output is the pictorial representation of the cluster assignations for the 60 images. Faces with the same color belong to the same cluster.
  • Figure 2: Key definitions in DBSCAN. (a): the hyper-parameters that control the algorithm are $\epsilon$ and $\delta=3$ in this example. A core point is red, a border point is green dot, and a noise point blue. Point $q$ is directly $\epsilon$-reachable from $p$ since $q\in \mathcal{N}_\epsilon(p)$; For points $q$ and $r$, it is easy to see that they are both directly $\epsilon$-reachable from each other; (b): $p$ is $\epsilon$-reachable from $q$; (c): $p$ and $q$ are $\epsilon$-connected.

Theorems & Definitions (5)

  • Definition 1: classification of points
  • Definition 2: direct $\epsilon$-reachable
  • Definition 3: $\epsilon$-reachable
  • Definition 4: $\epsilon$-connected
  • Definition 5: cluster of DBSCAN