Table of Contents
Fetching ...

Deep Spectral Clustering via Joint Spectral Embedding and Kmeans

Wengang Guo, Wei Ye

TL;DR

The paper tackles two key problems in spectral clustering: the difficulty of constructing a meaningful similarity graph in high dimensions and the decoupled optimization of embedding and clustering. It introduces Deep Spectral Clustering (DSC), which jointly learns spectral embeddings and Kmeans-compatible representations via two modules—the spectral embedding module using a deep autoencoder with power iteration and a greedy Kmeans module that rotates embeddings to reveal the worst cluster-direction—trained with a unified joint loss $ ext{L}_{ ext{joint}}= ext{L}_{ ext{spectral}}+ ext{L}_{ ext{greedy}}$. DSC computes a self-tuning affinity $A$ and power-iterates to obtain embeddings $Z$, while optimizing in the direction of the residual clustering structure through an orthogonal rotation $V$ and a target matrix $Y$, enabling end-to-end training. Experiments on seven real-world datasets show DSC achieves state-of-the-art clustering performance, with ablations confirming the necessity of both the spectral-embedding and greedy-Kmeans components, and evidence of good generalization to unseen data. The method offers a practical, scalable approach to deep spectral clustering with competitive running times and robust clustering quality.

Abstract

Spectral clustering is a popular clustering method. It first maps data into the spectral embedding space and then uses Kmeans to find clusters. However, the two decoupled steps prohibit joint optimization for the optimal solution. In addition, it needs to construct the similarity graph for samples, which suffers from the curse of dimensionality when the data are high-dimensional. To address these two challenges, we introduce \textbf{D}eep \textbf{S}pectral \textbf{C}lustering (\textbf{DSC}), which consists of two main modules: the spectral embedding module and the greedy Kmeans module. The former module learns to efficiently embed raw samples into the spectral embedding space using deep neural networks and power iteration. The latter module improves the cluster structures of Kmeans on the learned spectral embeddings by a greedy optimization strategy, which iteratively reveals the direction of the worst cluster structures and optimizes embeddings in this direction. To jointly optimize spectral embeddings and clustering, we seamlessly integrate the two modules and optimize them in an end-to-end manner. Experimental results on seven real-world datasets demonstrate that DSC achieves state-of-the-art clustering performance.

Deep Spectral Clustering via Joint Spectral Embedding and Kmeans

TL;DR

The paper tackles two key problems in spectral clustering: the difficulty of constructing a meaningful similarity graph in high dimensions and the decoupled optimization of embedding and clustering. It introduces Deep Spectral Clustering (DSC), which jointly learns spectral embeddings and Kmeans-compatible representations via two modules—the spectral embedding module using a deep autoencoder with power iteration and a greedy Kmeans module that rotates embeddings to reveal the worst cluster-direction—trained with a unified joint loss . DSC computes a self-tuning affinity and power-iterates to obtain embeddings , while optimizing in the direction of the residual clustering structure through an orthogonal rotation and a target matrix , enabling end-to-end training. Experiments on seven real-world datasets show DSC achieves state-of-the-art clustering performance, with ablations confirming the necessity of both the spectral-embedding and greedy-Kmeans components, and evidence of good generalization to unseen data. The method offers a practical, scalable approach to deep spectral clustering with competitive running times and robust clustering quality.

Abstract

Spectral clustering is a popular clustering method. It first maps data into the spectral embedding space and then uses Kmeans to find clusters. However, the two decoupled steps prohibit joint optimization for the optimal solution. In addition, it needs to construct the similarity graph for samples, which suffers from the curse of dimensionality when the data are high-dimensional. To address these two challenges, we introduce \textbf{D}eep \textbf{S}pectral \textbf{C}lustering (\textbf{DSC}), which consists of two main modules: the spectral embedding module and the greedy Kmeans module. The former module learns to efficiently embed raw samples into the spectral embedding space using deep neural networks and power iteration. The latter module improves the cluster structures of Kmeans on the learned spectral embeddings by a greedy optimization strategy, which iteratively reveals the direction of the worst cluster structures and optimizes embeddings in this direction. To jointly optimize spectral embeddings and clustering, we seamlessly integrate the two modules and optimize them in an end-to-end manner. Experimental results on seven real-world datasets demonstrate that DSC achieves state-of-the-art clustering performance.

Paper Structure

This paper contains 17 sections, 12 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Embedding visualization at different training iterations (Ite) of DSC on a subset of the FASHION dataset xiao2017fashion. We set the number of neurons in the embedding layer of autoencoder (AE) to two for direct 2D visualization. The color of samples denotes the ground-truth clusters whereas the background color denotes the Kmeans clustering results. Numbers in parentheses denote the values of clustering evaluation metrics ACC% and NMI%, respectively.
  • Figure 2: The overall architecture of the proposed DSC.
  • Figure 3: Running time comparison (in seconds).
  • Figure 4: Embedding visualization using PCA.