Table of Contents
Fetching ...

Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering

Huaming Ling, Chenglong Bao, Jiebo Song, Zuoqiang Shi

TL;DR

The paper tackles the scalability challenge of multi-view subspace clustering by introducing FSSMSC, a unified framework that learns a consensus anchor graph and landmarks’ low-dimensional representations in tandem. By jointly optimizing anchor graph construction and label propagation within a landmark-based, multi-view setting, and solving via an ADMM-like alternating algorithm with convergence guarantees, the approach achieves linear complexity with respect to data size. Empirical results across seven benchmark datasets show that FSSMSC outperforms scalable baselines in clustering accuracy while drastically reducing running times and memory usage, especially on large-scale data. The work demonstrates practical impact for large multi-view analytics, enabling efficient semi-supervised clustering where full affinity matrices are prohibitive.

Abstract

In this paper, we introduce a Fast and Scalable Semi-supervised Multi-view Subspace Clustering (FSSMSC) method, a novel solution to the high computational complexity commonly found in existing approaches. FSSMSC features linear computational and space complexity relative to the size of the data. The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks. Unlike traditional methods that manage the anchor graph construction and the label propagation process separately, this paper proposes a unified optimization model that facilitates simultaneous learning of both. An effective alternating update algorithm with convergence guarantees is proposed to solve the unified optimization model. Additionally, the method employs the obtained anchor graph and landmarks' low-dimensional representations to deduce low-dimensional representations for raw data. Following this, a straightforward clustering approach is conducted on these low-dimensional representations to achieve the final clustering results. The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.

Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering

TL;DR

The paper tackles the scalability challenge of multi-view subspace clustering by introducing FSSMSC, a unified framework that learns a consensus anchor graph and landmarks’ low-dimensional representations in tandem. By jointly optimizing anchor graph construction and label propagation within a landmark-based, multi-view setting, and solving via an ADMM-like alternating algorithm with convergence guarantees, the approach achieves linear complexity with respect to data size. Empirical results across seven benchmark datasets show that FSSMSC outperforms scalable baselines in clustering accuracy while drastically reducing running times and memory usage, especially on large-scale data. The work demonstrates practical impact for large multi-view analytics, enabling efficient semi-supervised clustering where full affinity matrices are prohibitive.

Abstract

In this paper, we introduce a Fast and Scalable Semi-supervised Multi-view Subspace Clustering (FSSMSC) method, a novel solution to the high computational complexity commonly found in existing approaches. FSSMSC features linear computational and space complexity relative to the size of the data. The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks. Unlike traditional methods that manage the anchor graph construction and the label propagation process separately, this paper proposes a unified optimization model that facilitates simultaneous learning of both. An effective alternating update algorithm with convergence guarantees is proposed to solve the unified optimization model. Additionally, the method employs the obtained anchor graph and landmarks' low-dimensional representations to deduce low-dimensional representations for raw data. Following this, a straightforward clustering approach is conducted on these low-dimensional representations to achieve the final clustering results. The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.
Paper Structure (25 sections, 15 theorems, 91 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 25 sections, 15 theorems, 91 equations, 7 figures, 9 tables, 1 algorithm.

Key Result

Lemma 1

Let $h(\mathbf{B})=\frac{1}{2}\sum_{v=1}^V\|\mathbf{X}^{(v)}-\mathbf{U}^{(v)}\mathbf{B}\|^2$ and $P,Q$ are given in eq:P and eq:Q. Then $\nabla_\mathbf{B} h(\mathbf{B})$ is Lipschitz continuous with a Lipschitz constant $L_h>0$. $\nabla_\mathbf{Z} P(\mathbf{Z})$ is Lipschitz continuous in $\mathcal{

Figures (7)

  • Figure 1: Clustering accuracy with different value of $\lambda_M$ and $\beta$ on six datasets.
  • Figure 2: Clustering accuracy with different value of $\lambda_Z$ on six datasets.
  • Figure 3: The numerical convergence curves for six datasets, where the stopping criteria at iteration $j$ is defined as: $\text{StopC}=\max(\|\mathbf{B}^{j+1}-\mathbf{B}^{j}\|_\infty,\|\mathbf{Z}^{j+1}-\mathbf{Z}^{j}\|_\infty)$.
  • Figure 4: The evaluation metric ACC with different ratio of labeled samples on six datasets.
  • Figure 5: The evaluation metric NMI with different ratio of labeled samples on six datasets.
  • ...and 2 more figures

Theorems & Definitions (31)

  • Lemma 1
  • Theorem 1
  • Definition 1: attouch2013convergence
  • Definition 2
  • Definition 3: Lipschitz Continuity
  • Lemma 2
  • proof
  • Theorem 2
  • proof
  • Lemma 3
  • ...and 21 more