Table of Contents
Fetching ...

Consistent Spectral Clustering in Hyperbolic Spaces

Sagar Ghosh, Swagatam Das

TL;DR

This work introduces HSCA, a spectral clustering framework operating in hyperbolic space to better capture hierarchical and tree-like data structures that challenge Euclidean representations. By embedding data into the Poincaré disc, constructing geodesic-based affinities, and performing spectral clustering on the normalized hyperbolic Laplacian, the method achieves weak consistency with a convergence behavior at least as fast as Euclidean spectral clustering. The paper also adapts hyperbolic variants of established Euclidean techniques (e.g., landmark-based HSCA-HLS K and fast variants) and provides extensive empirical validation on real and synthetic datasets, showing improved clustering quality, particularly for hierarchical data. The theoretical guarantees, along with practical algorithms and ablation studies, suggest that non-Euclidean spaces, especially hyperbolic geometry, offer a powerful framework for efficient and meaningful clustering in complex data regimes.

Abstract

Clustering, as an unsupervised technique, plays a pivotal role in various data analysis applications. Among clustering algorithms, Spectral Clustering on Euclidean Spaces has been extensively studied. However, with the rapid evolution of data complexity, Euclidean Space is proving to be inefficient for representing and learning algorithms. Although Deep Neural Networks on hyperbolic spaces have gained recent traction, clustering algorithms or non-deep machine learning models on non-Euclidean Spaces remain underexplored. In this paper, we propose a spectral clustering algorithm on Hyperbolic Spaces to address this gap. Hyperbolic Spaces offer advantages in representing complex data structures like hierarchical and tree-like structures, which cannot be embedded efficiently in Euclidean Spaces. Our proposed algorithm replaces the Euclidean Similarity Matrix with an appropriate Hyperbolic Similarity Matrix, demonstrating improved efficiency compared to clustering in Euclidean Spaces. Our contributions include the development of the spectral clustering algorithm on Hyperbolic Spaces and the proof of its weak consistency. We show that our algorithm converges at least as fast as Spectral Clustering on Euclidean Spaces. To illustrate the efficacy of our approach, we present experimental results on the Wisconsin Breast Cancer Dataset, highlighting the superior performance of Hyperbolic Spectral Clustering over its Euclidean counterpart. This work opens up avenues for utilizing non-Euclidean Spaces in clustering algorithms, offering new perspectives for handling complex data structures and improving clustering efficiency.

Consistent Spectral Clustering in Hyperbolic Spaces

TL;DR

This work introduces HSCA, a spectral clustering framework operating in hyperbolic space to better capture hierarchical and tree-like data structures that challenge Euclidean representations. By embedding data into the Poincaré disc, constructing geodesic-based affinities, and performing spectral clustering on the normalized hyperbolic Laplacian, the method achieves weak consistency with a convergence behavior at least as fast as Euclidean spectral clustering. The paper also adapts hyperbolic variants of established Euclidean techniques (e.g., landmark-based HSCA-HLS K and fast variants) and provides extensive empirical validation on real and synthetic datasets, showing improved clustering quality, particularly for hierarchical data. The theoretical guarantees, along with practical algorithms and ablation studies, suggest that non-Euclidean spaces, especially hyperbolic geometry, offer a powerful framework for efficient and meaningful clustering in complex data regimes.

Abstract

Clustering, as an unsupervised technique, plays a pivotal role in various data analysis applications. Among clustering algorithms, Spectral Clustering on Euclidean Spaces has been extensively studied. However, with the rapid evolution of data complexity, Euclidean Space is proving to be inefficient for representing and learning algorithms. Although Deep Neural Networks on hyperbolic spaces have gained recent traction, clustering algorithms or non-deep machine learning models on non-Euclidean Spaces remain underexplored. In this paper, we propose a spectral clustering algorithm on Hyperbolic Spaces to address this gap. Hyperbolic Spaces offer advantages in representing complex data structures like hierarchical and tree-like structures, which cannot be embedded efficiently in Euclidean Spaces. Our proposed algorithm replaces the Euclidean Similarity Matrix with an appropriate Hyperbolic Similarity Matrix, demonstrating improved efficiency compared to clustering in Euclidean Spaces. Our contributions include the development of the spectral clustering algorithm on Hyperbolic Spaces and the proof of its weak consistency. We show that our algorithm converges at least as fast as Spectral Clustering on Euclidean Spaces. To illustrate the efficacy of our approach, we present experimental results on the Wisconsin Breast Cancer Dataset, highlighting the superior performance of Hyperbolic Spectral Clustering over its Euclidean counterpart. This work opens up avenues for utilizing non-Euclidean Spaces in clustering algorithms, offering new perspectives for handling complex data structures and improving clustering efficiency.
Paper Structure (27 sections, 5 theorems, 35 equations, 6 figures, 3 tables)

This paper contains 27 sections, 5 theorems, 35 equations, 6 figures, 3 tables.

Key Result

Lemma 5.1

For the usual Euclidean Gaussian Kernel given by $K(x,y)=exp(-a\|x-y\|^2)$, we have $K_{H_G}(x,y)\leq K(x,y)$ whenever $x,y\in H$.

Figures (6)

  • Figure 1: Consider the Wisconsin Breast Cancer Dataset taken from the UCL Machine Learning Repository. The leftmost figure describes t-SNE visualization of the clusters of two types of tumors: malignant(yellow dots) and benign(brown dots). Due to the presence of one predominant connected component, Euclidean spectral clustering forms only one cluster (with only one isolated point in the other cluster), whereas hyperbolic spectral clustering forms the clusters after separating two hierarchies present in the data. Clearly, the hyperbolic clusters provide much more accurate description of the dataset over its Euclidean counterpart.
  • Figure 2: Geodesics in Different Model Hyperbolic Spaces
  • Figure 3: Embedding of the dataset from the Euclidean Space into the Poincaré Disc, the left figure describes how a natural hierarchy looks like in the Euclidean Space, the right figure describes how the embedded dataset looks like on the Poincaré Disc.
  • Figure 4: t-SNE Visualization of the Airport Dataset and Clusters
  • Figure 7: Visualization of ARI Values with respect to the hyperparameter $\frac{1}{\sigma^2}$ (for Gaussian) and $\frac{1}{2\sigma}$ (for Poisson)
  • ...and 1 more figures

Theorems & Definitions (18)

  • Lemma 5.1
  • proof
  • Remark 5.1
  • Remark 5.2
  • Lemma 5.2
  • proof
  • Lemma 5.3
  • proof
  • Lemma 5.4
  • proof
  • ...and 8 more