Table of Contents
Fetching ...

SpectralNet: Spectral Clustering using Deep Neural Networks

Uri Shaham, Kelly Stanton, Henry Li, Boaz Nadler, Ronen Basri, Yuval Kluger

TL;DR

A deep learning approach to spectral clustering that overcomes the major limitations of scalability and generalization of the spectral embedding and applies VC dimension theory to derive a lower bound on the size of SpectralNet.

Abstract

Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at https://github.com/kstant0725/SpectralNet .

SpectralNet: Spectral Clustering using Deep Neural Networks

TL;DR

A deep learning approach to spectral clustering that overcomes the major limitations of scalability and generalization of the spectral embedding and applies VC dimension theory to derive a lower bound on the size of SpectralNet.

Abstract

Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at https://github.com/kstant0725/SpectralNet .

Paper Structure

This paper contains 21 sections, 5 theorems, 34 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Theorem 4.1

$\text{VC dim}(\mathcal{F}^\text{spectral clustering}_n) \ge\frac{1}{10}n$.

Figures (8)

  • Figure 1: Illustrative 2D and 3D examples showing the results of our SpectralNet clustering (top) compared to typical results obtained with DCN, VaDE, DEPICT and IMSAT (bottom) on simulated datasets in 2D and 3D. Our approach successfully finds these non-convex clusters, whereas the competing algorithms fail on all five examples. (The full set of results for these algorithms is shown in Figure \ref{['fig:2D']} in Appendix \ref{['app:2d']}.)
  • Figure 2: Grassmann distance as a function of iteration update for the MNIST dataset.
  • Figure 3: Illustrative 2D demo for semi-supervised learning using SpectralNet. Left: SpectralNet fails to recognize the true cluster structure, due to the heavy noise. Right: using randomly chosen 2% of the labels the true cluster structure is recognized.
  • Figure 4: SpectralNet performance on the nested 'C' example. Top row: clustering using SpectralNet (left), spectral clustering (center), and $k$-means (right). Bottom row, left panel: SpectralNet outputs (plotted in blue and green) vs. the true eigenvectors. Bottom row, right panel: loss and Grassmann distance as a function of iteration number; the values on the horizontal axis $\times 100$ are the numbers of the parameter updates.
  • Figure 5: from top to bottom: Results of DCN, VaDE, DEPICT and IMSAT on our illustrative datasets.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Theorem 4.1
  • Corollary 4.2
  • proof
  • Definition C.1: $(\alpha,\beta)$-separated graph
  • Lemma C.2
  • proof
  • Lemma C.3
  • proof
  • Lemma C.4
  • proof
  • ...and 1 more