Table of Contents
Fetching ...

SACA: Selective Attention-Based Clustering Algorithm

Meysam Shirdel Bilehsavar, Razieh Ghaedi, Samira Seyed Taheri, Xinqi Fan, Christian O'Reilly

TL;DR

The paper tackles the challenge of parameter tuning in density-based clustering by introducing SACA, a Selective Attention-Based Clustering Algorithm that derives a data-driven pruning threshold to isolate high-density cluster cores and then reintegrates sparse points. It combines a principled core-extraction phase with flexible reintegration (via nearest-neighbor or centroid-based assignment) and an intuitive Attention Selectivity Coefficient for multi-level pattern discovery. Across 16 benchmark datasets and six metrics, SACA demonstrates robust, accurate performance with minimal tuning, outperforming DBSCAN, HDBSCAN, and OPTICS in many scenarios. The approach emphasizes interpretability and practicality, offering memory-time trade-offs (including a KD-tree variant) and outlining directions for memory-efficient and noise-adaptive extensions.

Abstract

Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the requirement of critical parameter tuning by users, which typically requires significant domain expertise. This paper introduces a novel density-based clustering algorithm loosely inspired by the concept of selective attention, designed to minimize reliance on parameter tuning for most applications. The proposed method computes an adaptive threshold to exclude sparsely distributed points and outliers, constructs an initial cluster framework, and subsequently reintegrates the filtered points to refine the final results. Extensive experiments on diverse benchmark datasets demonstrate the robustness, accuracy, and ease of use of the proposed approach, establishing it as a powerful alternative to conventional density-based clustering techniques.

SACA: Selective Attention-Based Clustering Algorithm

TL;DR

The paper tackles the challenge of parameter tuning in density-based clustering by introducing SACA, a Selective Attention-Based Clustering Algorithm that derives a data-driven pruning threshold to isolate high-density cluster cores and then reintegrates sparse points. It combines a principled core-extraction phase with flexible reintegration (via nearest-neighbor or centroid-based assignment) and an intuitive Attention Selectivity Coefficient for multi-level pattern discovery. Across 16 benchmark datasets and six metrics, SACA demonstrates robust, accurate performance with minimal tuning, outperforming DBSCAN, HDBSCAN, and OPTICS in many scenarios. The approach emphasizes interpretability and practicality, offering memory-time trade-offs (including a KD-tree variant) and outlining directions for memory-efficient and noise-adaptive extensions.

Abstract

Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the requirement of critical parameter tuning by users, which typically requires significant domain expertise. This paper introduces a novel density-based clustering algorithm loosely inspired by the concept of selective attention, designed to minimize reliance on parameter tuning for most applications. The proposed method computes an adaptive threshold to exclude sparsely distributed points and outliers, constructs an initial cluster framework, and subsequently reintegrates the filtered points to refine the final results. Extensive experiments on diverse benchmark datasets demonstrate the robustness, accuracy, and ease of use of the proposed approach, establishing it as a powerful alternative to conventional density-based clustering techniques.

Paper Structure

This paper contains 23 sections, 20 equations, 14 figures, 5 tables, 1 algorithm.

Figures (14)

  • Figure 1: Simplified depiction of visual selective attention. The brain attempts to find relevant patterns of information, emphasizing the importance of structured information (blue and yellow clusters) over unstructured noise (red dots).
  • Figure 2: Graphical illustration of the method’s key steps (columns) using three examples of cluster distributions (rows).
  • Figure 3: Clustering results in the context of uniform cluster densities (using default parametrization): (a) Spiral, (b) Noisy Circles, (c) Complex-9.
  • Figure 4: Clustering results in presence of non-uniform cluster densities (using default parametrization): (a) Smile 2 and (b) Unbalanced.
  • Figure 5: Clustering results on clusters with conflicts: (a) Birch1, (b) A3, (c) S2, and (d) 3-Compound.
  • ...and 9 more figures