Table of Contents
Fetching ...

Multi-View Spectral Clustering for Graphs with Multiple View Structures

Yorgos Tsitsikas, Evangelos E. Papalexakis

TL;DR

This work presents a general clustering framework that subsumes a series of seemingly disparate clustering methods, including various methods belonging to the widely popular spectral clustering framework, and proposes GenClus: a method that is simultaneously an instance of this framework and a generalization of spectral clustering, while also being closely related to k-means.

Abstract

Despite the fundamental importance of clustering, to this day, much of the relevant research is still based on ambiguous foundations, leading to an unclear understanding of whether or how the various clustering methods are connected with each other. In this work, we provide an additional stepping stone towards resolving such ambiguities by presenting a general clustering framework that subsumes a series of seemingly disparate clustering methods, including various methods belonging to the widely popular spectral clustering framework. In fact, the generality of the proposed framework is additionally capable of shedding light to the largely unexplored area of multi-view graphs where each view may have differently clustered nodes. In turn, we propose GenClus: a method that is simultaneously an instance of this framework and a generalization of spectral clustering, while also being closely related to k-means as well. This results in a principled alternative to the few existing methods studying this special type of multi-view graphs. Then, we conduct in-depth experiments, which demonstrate that GenClus is more computationally efficient than existing methods, while also attaining similar or better clustering performance. Lastly, a qualitative real-world case-study further demonstrates the ability of GenClus to produce meaningful clusterings.

Multi-View Spectral Clustering for Graphs with Multiple View Structures

TL;DR

This work presents a general clustering framework that subsumes a series of seemingly disparate clustering methods, including various methods belonging to the widely popular spectral clustering framework, and proposes GenClus: a method that is simultaneously an instance of this framework and a generalization of spectral clustering, while also being closely related to k-means.

Abstract

Despite the fundamental importance of clustering, to this day, much of the relevant research is still based on ambiguous foundations, leading to an unclear understanding of whether or how the various clustering methods are connected with each other. In this work, we provide an additional stepping stone towards resolving such ambiguities by presenting a general clustering framework that subsumes a series of seemingly disparate clustering methods, including various methods belonging to the widely popular spectral clustering framework. In fact, the generality of the proposed framework is additionally capable of shedding light to the largely unexplored area of multi-view graphs where each view may have differently clustered nodes. In turn, we propose GenClus: a method that is simultaneously an instance of this framework and a generalization of spectral clustering, while also being closely related to k-means as well. This results in a principled alternative to the few existing methods studying this special type of multi-view graphs. Then, we conduct in-depth experiments, which demonstrate that GenClus is more computationally efficient than existing methods, while also attaining similar or better clustering performance. Lastly, a qualitative real-world case-study further demonstrates the ability of GenClus to produce meaningful clusterings.
Paper Structure (55 sections, 1 theorem, 22 equations, 6 figures, 2 tables)

This paper contains 55 sections, 1 theorem, 22 equations, 6 figures, 2 tables.

Key Result

Theorem 1.1

If $\mathbf{Y} \in \mathbb{R}^{I \times I}$ is a symmetric matrix with $E$ non-negative eigenvalues, then a positive semi-definite matrix, $\mathbf{S} \in \mathbb{R}^{I \times I}$, that minimizes $\left|\left|\mathbf{Y}-\mathbf{S}\right|\right|$ such that $\mathop{\mathrm{rank}}\limits\left({\mathbf

Figures (6)

  • Figure 1: Example of multi-view graph with 6 views grouped into 3 view clusters, each corresponding to a different node clustering with 3, 2 and 2 node clusters, respectively. Each view is visualized via its adjacency matrix and darker colors represent lower edge weights.
  • Figure 2: For each arrow the method on its right can be seen as a special case of the method on its left.
  • Figure 3: Clustering performance comparisons. Lines represent medians, while shaded areas represent 25-th and 75-th percentiles. Higher values signify better clustering quality and the maximum possible value is 1.
  • Figure 4: Clustering of airlines and airports by GenClus for the flights dataset. All colored bars are split into small pieces representing individual airports or airlines, and the colors represent labels. The left-hand bar in (a) depicts the airlines colored based on their continent of origin, while the right-hand bar shows them colored based on the view clustering produced by GenClus. (b)-(d) show the adjacency matrices of representative views from each of the airline clusters produced by GenClus, and the horizontal colored bars on top indicate the actual location of each airport. The airport clusters generated by GenClus are depicted as grey squares, which is achieved by appropriately permuting each adjacency matrix individually.
  • Figure 5: Execution time comparisons of original methods for varying graph sizes and embedding sizes. Log-log plots are used for all experiments, with each plot having powers of ten in its vertical axis scaled equally to powers of ten in its horizontal axis. Lines represent medians, while shaded areas represent 25-th and 75-th percentiles.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 1.1