Table of Contents
Fetching ...

Scalable Multi-view Clustering via Explicit Kernel Features Maps

Chakib Fettal, Lazhar Labiod, Mohamed Nadif

TL;DR

The paper tackles the scalability of multi-view clustering on large, attributed networks by introducing a framework that uses explicit kernel feature maps to form a consensus subspace affinity without costly iterations. By leveraging kernel summation, it factorizes the consensus into a low-dimensional embedding, enabling spectral-type clustering through efficient SVD and $k$-means. The main contributions are the MvSCK framework, a kernel-sum based consensus construction, and a view-weighting mechanism that improves clustering quality, all supported by extensive experiments on real-world, large-scale networks. The approach demonstrates strong clustering performance and superior running times, making scalable multi-view clustering feasible for datasets with millions of points and numerous views.

Abstract

The proliferation of high-dimensional data from sources such as social media, sensor networks, and online platforms has created new challenges for clustering algorithms. Multi-view clustering, which integrates complementary information from multiple data perspectives, has emerged as a powerful solution. However, existing methods often struggle with scalability and efficiency, particularly on large attributed networks. In this work, we address these limitations by leveraging explicit kernel feature maps and a non-iterative optimization strategy, enabling efficient and accurate clustering on datasets with millions of points.

Scalable Multi-view Clustering via Explicit Kernel Features Maps

TL;DR

The paper tackles the scalability of multi-view clustering on large, attributed networks by introducing a framework that uses explicit kernel feature maps to form a consensus subspace affinity without costly iterations. By leveraging kernel summation, it factorizes the consensus into a low-dimensional embedding, enabling spectral-type clustering through efficient SVD and -means. The main contributions are the MvSCK framework, a kernel-sum based consensus construction, and a view-weighting mechanism that improves clustering quality, all supported by extensive experiments on real-world, large-scale networks. The approach demonstrates strong clustering performance and superior running times, making scalable multi-view clustering feasible for datasets with millions of points and numerous views.

Abstract

The proliferation of high-dimensional data from sources such as social media, sensor networks, and online platforms has created new challenges for clustering algorithms. Multi-view clustering, which integrates complementary information from multiple data perspectives, has emerged as a powerful solution. However, existing methods often struggle with scalability and efficiency, particularly on large attributed networks. In this work, we address these limitations by leveraging explicit kernel feature maps and a non-iterative optimization strategy, enabling efficient and accurate clustering on datasets with millions of points.
Paper Structure (30 sections, 13 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 30 sections, 13 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Schematic diagram: the black section represents the operations that are explicitly performed in our approach; the green section represents what is transpiring implicitly (operations in the same column represent the same step). Given a set of inputs, we apply an singular value decomposition (SVD) to obtain the left singular values (equivalent to implicitly computing a coefficient matrix using low-rank subspace clustering). We then apply a nonnegative feature map (equivalent to implicitly computing an affinity matrix). We then weigh and concatenate the obtained embeddings of each view (equivalent to summing the affinity matrices into a single consensus one). Finally, we perform an SVD followed by a k-Means to obtain a consensus partition (equivalent to performing spectral clustering on the consensus affinity matrix). Novelty comes from extending fettal2023scalable to a multi-view setting in an efficient manner through the properties of kernels.
  • Figure 2: Holm post-hoc mean rank test ($\alpha=0.01$) with respect to clustering performance.
  • Figure 3: Holm post-hoc mean rank test ($\alpha=0.01$) with respect to running times.
  • Figure 4: Sensitivity of MvSCK in terms of CA, CF1, NMI and ARI according to the temperature parameter $T$.
  • Figure 5: Clustering accuracy with and without the regularization vector $\boldsymbol{\lambda}$