Table of Contents
Fetching ...

Almost Linear Time Consistent Mode Estimation and Quick Shift Clustering

Sajjad Hashemian

TL;DR

The paper tackles scalable density-based clustering in high dimensions by marrying Locality-Sensitive Hashing with the Quick Shift framework to enable approximate KDE. It introduces LSH-KDE and a Fast Quick Shift variant (LSH-QuickShift) that builds a directed clustering graph with near-linear time and space, while preserving consistency guarantees for mode estimation and point-to-mode assignments. Theoretical results quantify the estimation error and separation conditions under approximate densities, and empirical results on clustering and image segmentation demonstrate strong performance and scalability compared to established baselines. This approach offers a practical, provably consistent solution for large-scale, high-dimensional density-based clustering tasks with real-world applicability to tasks like image segmentation.

Abstract

In this paper, we propose a method for density-based clustering in high-dimensional spaces that combines Locality-Sensitive Hashing (LSH) with the Quick Shift algorithm. The Quick Shift algorithm, known for its hierarchical clustering capabilities, is extended by integrating approximate Kernel Density Estimation (KDE) using LSH to provide efficient density estimates. The proposed approach achieves almost linear time complexity while preserving the consistency of density-based clustering.

Almost Linear Time Consistent Mode Estimation and Quick Shift Clustering

TL;DR

The paper tackles scalable density-based clustering in high dimensions by marrying Locality-Sensitive Hashing with the Quick Shift framework to enable approximate KDE. It introduces LSH-KDE and a Fast Quick Shift variant (LSH-QuickShift) that builds a directed clustering graph with near-linear time and space, while preserving consistency guarantees for mode estimation and point-to-mode assignments. Theoretical results quantify the estimation error and separation conditions under approximate densities, and empirical results on clustering and image segmentation demonstrate strong performance and scalability compared to established baselines. This approach offers a practical, provably consistent solution for large-scale, high-dimensional density-based clustering tasks with real-world applicability to tasks like image segmentation.

Abstract

In this paper, we propose a method for density-based clustering in high-dimensional spaces that combines Locality-Sensitive Hashing (LSH) with the Quick Shift algorithm. The Quick Shift algorithm, known for its hierarchical clustering capabilities, is extended by integrating approximate Kernel Density Estimation (KDE) using LSH to provide efficient density estimates. The proposed approach achieves almost linear time complexity while preserving the consistency of density-based clustering.

Paper Structure

This paper contains 12 sections, 6 theorems, 27 equations, 1 figure, 1 table, 1 algorithm.

Key Result

proposition thmcounterproposition

For the Euclidean metric $\ell_2$ and any fixed $r > 0$, LSH yields data structures for solving $(k, c, r)$-ANNS with space $O(n^{1 + \rho} + d n)$ and query time $O(d n^{\rho})$, where $\rho = \frac{1}{c^2} - o(1)$.

Figures (1)

  • Figure 1: Comparison of image segmentation algorithms. For each image, the number of detected segments (#), computation time (in seconds), and the segmentation method are indicated above.

Theorems & Definitions (13)

  • definition thmcounterdefinition: $(k, c, r)$-ANNS
  • proposition thmcounterproposition: Optimal LSH for $(c, r)$-ANNS o2014optimal
  • definition thmcounterdefinition: Kernel Density Estimation
  • proposition thmcounterproposition: Approximate KDE via LSH charikar2020kernel
  • theorem thmcountertheorem: Computational Complexity
  • proof
  • theorem thmcountertheorem: Mode estimation via LSH-KDE Quick Shift
  • proof
  • definition thmcounterdefinition: $(r,\delta)$-separation, dasgupta2014optimal
  • theorem thmcountertheorem
  • ...and 3 more