Table of Contents
Fetching ...

S^2MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering

Zhen Long, Qiyuan Wang, Yazhou Ren, Yipeng Liu, Ce Zhu

TL;DR

The paper tackles scalability in multi-view clustering by shifting focus from global correlations of anchor graphs or projection matrices to direct learning of embedding-feature correlations within and across views. It introduces the tensor low-frequency approximation (TLFA) applied to a rotated embedding tensor $oldsymbol{ rak B}\uparrow$ to enforce intra-view similarity, while a consensus embedding $ ilde{oldsymbol B}$ enforces inter-view semantic consistency. The model optimizes over projection matrices $oldsymbol{U}_v$ and embedding features $oldsymbol{B}_v$ with an auxiliary variable, enabling a separable alternating optimization that jointly yields a robust fused representation and final clustering via $oldsymbol{D}$ and $oldsymbol{G}$. Experiments on six large-scale datasets show that S^2MVTC substantially improves clustering accuracy and reduces CPU time compared to state-of-the-art methods, demonstrating strong scalability for massive data; the authors also provide publicly available code. The approach offers practical impact for large-scale multi-view tasks in vision, neuroscience, and multimedia, where rapid and accurate clustering across many views is essential.

Abstract

Anchor-based large-scale multi-view clustering has attracted considerable attention for its effectiveness in handling massive datasets. However, current methods mainly seek the consensus embedding feature for clustering by exploring global correlations between anchor graphs or projection matrices.In this paper, we propose a simple yet efficient scalable multi-view tensor clustering (S^2MVTC) approach, where our focus is on learning correlations of embedding features within and across views. Specifically, we first construct the embedding feature tensor by stacking the embedding features of different views into a tensor and rotating it. Additionally, we build a novel tensor low-frequency approximation (TLFA) operator, which incorporates graph similarity into embedding feature learning, efficiently achieving smooth representation of embedding features within different views. Furthermore, consensus constraints are applied to embedding features to ensure inter-view semantic consistency. Experimental results on six large-scale multi-view datasets demonstrate that S^2MVTC significantly outperforms state-of-the-art algorithms in terms of clustering performance and CPU execution time, especially when handling massive data. The code of S^2MVTC is publicly available at https://github.com/longzhen520/S2MVTC.

S^2MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering

TL;DR

The paper tackles scalability in multi-view clustering by shifting focus from global correlations of anchor graphs or projection matrices to direct learning of embedding-feature correlations within and across views. It introduces the tensor low-frequency approximation (TLFA) applied to a rotated embedding tensor to enforce intra-view similarity, while a consensus embedding enforces inter-view semantic consistency. The model optimizes over projection matrices and embedding features with an auxiliary variable, enabling a separable alternating optimization that jointly yields a robust fused representation and final clustering via and . Experiments on six large-scale datasets show that S^2MVTC substantially improves clustering accuracy and reduces CPU time compared to state-of-the-art methods, demonstrating strong scalability for massive data; the authors also provide publicly available code. The approach offers practical impact for large-scale multi-view tasks in vision, neuroscience, and multimedia, where rapid and accurate clustering across many views is essential.

Abstract

Anchor-based large-scale multi-view clustering has attracted considerable attention for its effectiveness in handling massive datasets. However, current methods mainly seek the consensus embedding feature for clustering by exploring global correlations between anchor graphs or projection matrices.In this paper, we propose a simple yet efficient scalable multi-view tensor clustering (S^2MVTC) approach, where our focus is on learning correlations of embedding features within and across views. Specifically, we first construct the embedding feature tensor by stacking the embedding features of different views into a tensor and rotating it. Additionally, we build a novel tensor low-frequency approximation (TLFA) operator, which incorporates graph similarity into embedding feature learning, efficiently achieving smooth representation of embedding features within different views. Furthermore, consensus constraints are applied to embedding features to ensure inter-view semantic consistency. Experimental results on six large-scale multi-view datasets demonstrate that S^2MVTC significantly outperforms state-of-the-art algorithms in terms of clustering performance and CPU execution time, especially when handling massive data. The code of S^2MVTC is publicly available at https://github.com/longzhen520/S2MVTC.
Paper Structure (15 sections, 19 equations, 4 figures, 4 tables, 3 algorithms)

This paper contains 15 sections, 19 equations, 4 figures, 4 tables, 3 algorithms.

Figures (4)

  • Figure 1: Comparison frameworks of current methods.
  • Figure 2: The framework of S$^2$MVTC. (a) The process involves mapping each pre-given anchor graph $\phi(\mathbf{X}_v)$ using $\mathbf{U}_v$ to obtain the corresponding embedding feature $\mathbf{B}_v$. These embedding features, $\mathbf{B}_v$, are then formed into a tensor $\mathcal{B}$. This tensor, along with the newly defined $\mathcal{B}_\textbf{TLFA}$ and $\sum_{v=1}^{V}\|\mathbf{B}_v-\tilde{\mathbf{B}}\|_{\operatorname{F}}$, are utilized to explore intra-view correlations and semantic consistency, respectively, where $\tilde{\mathbf{B}}=\sum_{v=1}^{V}\frac{1}{V}\mathbf{B}_v$. (b) The tensor low-frequency approximation $\mathcal{B}_\textbf{TLFA}$.
  • Figure 3: The change of clustering performance as parameters $\lambda$, $\beta$, and $L$, $M$ on Caltech102.
  • Figure 4: The embedding feature learning process on the CCV dataset. From left to right, the nines columns represent the embedding features after the 1-st iteration, the 1-st TLFA operator, the 2-ed to 6-th iteration, the 6-th TLFA operator, and the 7-th iteration, respectively. Each feature is visualized using t-SNE van2008visualizing: View 1 for SIFT features, View 2 for STIP features, View 3 for MFCC features, and ‘Global' for the fusion of these features.

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6