Table of Contents
Fetching ...

Performance Gaps in Multi-view Clustering under the Nested Matrix-Tensor Model

Hugo Lebeau, Mohamed El Amine Seddik, José Henrique de Morais Goulart

TL;DR

This work analyzes the performance gap between tensor-based and unfolding-based spectral methods for a nested matrix-tensor model used in multi-view clustering. By applying random matrix theory, it derives the limiting spectral distributions for unfoldings and identifies precise spike-detection thresholds, including a BBP-type transition characterized by $\rho_T = \lim \frac{\beta_T^2 n_T}{\sqrt{n_1 n_2 n_3}}$ and the regime $\beta_T = \Theta(n_T^{1/4})$ for nontrivial recovery in unfoldings. It shows that the tensor-based rank-one estimator can achieve recovery at $\Theta(1)$ SNR, but is NP-hard to compute, whereas unfolding requires stronger scaling to detect the signal, yielding a quantifiable gap in achievable clustering accuracy. The results are corroborated by simulations, and the authors discuss a practical pathway to leverage unfolding for initialization in tensor methods. Overall, the paper clarifies when matrix unfoldings suffice and when full tensor spectral methods provide a tangible performance advantage for multi-view clustering.

Abstract

We study the estimation of a planted signal hidden in a recently introduced nested matrix-tensor model, which is an extension of the classical spiked rank-one tensor model, motivated by multi-view clustering. Prior work has theoretically examined the performance of a tensor-based approach, which relies on finding a best rank-one approximation, a problem known to be computationally hard. A tractable alternative approach consists in computing instead the best rank-one (matrix) approximation of an unfolding of the observed tensor data, but its performance was hitherto unknown. We quantify here the performance gap between these two approaches, in particular by deriving the precise algorithmic threshold of the unfolding approach and demonstrating that it exhibits a BBP-type transition behavior. This work is therefore in line with recent contributions which deepen our understanding of why tensor-based methods surpass matrix-based methods in handling structured tensor data.

Performance Gaps in Multi-view Clustering under the Nested Matrix-Tensor Model

TL;DR

This work analyzes the performance gap between tensor-based and unfolding-based spectral methods for a nested matrix-tensor model used in multi-view clustering. By applying random matrix theory, it derives the limiting spectral distributions for unfoldings and identifies precise spike-detection thresholds, including a BBP-type transition characterized by and the regime for nontrivial recovery in unfoldings. It shows that the tensor-based rank-one estimator can achieve recovery at SNR, but is NP-hard to compute, whereas unfolding requires stronger scaling to detect the signal, yielding a quantifiable gap in achievable clustering accuracy. The results are corroborated by simulations, and the authors discuss a practical pathway to leverage unfolding for initialization in tensor methods. Overall, the paper clarifies when matrix unfoldings suffice and when full tensor spectral methods provide a tangible performance advantage for multi-view clustering.

Abstract

We study the estimation of a planted signal hidden in a recently introduced nested matrix-tensor model, which is an extension of the classical spiked rank-one tensor model, motivated by multi-view clustering. Prior work has theoretically examined the performance of a tensor-based approach, which relies on finding a best rank-one approximation, a problem known to be computationally hard. A tractable alternative approach consists in computing instead the best rank-one (matrix) approximation of an unfolding of the observed tensor data, but its performance was hitherto unknown. We quantify here the performance gap between these two approaches, in particular by deriving the precise algorithmic threshold of the unfolding approach and demonstrating that it exhibits a BBP-type transition behavior. This work is therefore in line with recent contributions which deepen our understanding of why tensor-based methods surpass matrix-based methods in handling structured tensor data.
Paper Structure (35 sections, 14 theorems, 95 equations, 3 figures)

This paper contains 35 sections, 14 theorems, 95 equations, 3 figures.

Key Result

Lemma 1

Let ${\textnormal{z}} \sim {\mathcal{N}}(0, 1)$ and $f : \mathbb{R} \to \mathbb{R}$ be a continuously differentiable function. When the following expectations exist, $\mathbb{E} \left[ {\textnormal{z}} f({\textnormal{z}}) \right] = \mathbb{E} \left[ f'({\textnormal{z}}) \right]$.

Figures (3)

  • Figure 1: Empirical Spectral Distribution (ESD) and Limiting Spectral Distribution (LSD) of ${\mathbf{T}}^{(2)} {\mathbf{T}}^{(2) \top}$ (left) and ${\mathbf{T}}^{(3)} {\mathbf{T}}^{(3) \top}$ (right) with $n_1 = 600$, $n_2 = 400$ and $n_3 = 200$. Both spectra show an isolated eigenvalue close to its predicted asymptotic position, represented by the green dashed line. Left: $\rho_T = 2$, $\beta_M = 1.5$. The centered-and-scaled LSD $\tilde{\nu}$ and spike location $\tilde{\xi}$ are defined in Theorems \ref{['thm:lsd2']} and \ref{['thm:spike2']}. Right: $\varrho = 4$, $\beta_M = 3$. The LSD is a shifted-and-rescaled semi-circle distribution and the normalized spike location is $\varrho + \frac{1}{\varrho}$ as precised in Theorems \ref{['thm:lsd3']} and \ref{['thm:spike3']}.
  • Figure 2: Asymptotic alignment$\zeta^+ = \max(\zeta, 0)$ between the signal ${\bm{y}}$ and the dominant eigenvector of ${\mathbf{T}}^{(2)} {\mathbf{T}}^{(2) \top}$, as defined in Theorem \ref{['thm:spike2']}, with $c_1 = \frac{1}{2}$, $c_2 = \frac{1}{3}$ and $c_3 = \frac{1}{6}$. The curve $\zeta = 0$ is the position of the phase transition between the impossible detectability of the signal (below) and the presence of an isolated eigenvalue in the spectrum of ${\mathbf{T}}^{(2)} {\mathbf{T}}^{(2) \top}$ with corresponding eigenvector correlated with the signal (above). It has an asymptote $\beta_M = (\frac{c_1 c_2}{1 - c_3})^{1 / 4}$, represented by the red dashed line, as $\rho_T \to +\infty$.
  • Figure 3: Empirical versus theoretical multi-view clustering performance with parameters $(p, n, m) = (150, 300, 60)$, varying $\lVert {\bm{\mu}} \rVert$ and two values of $\lVert {\bm{h}} \rVert$ : $0.5$ in blue and $1.5$ in orange. The solid curve (O) is an optimistic upper bound given by Theorem \ref{['thm:spikeMP']}, as it can be reached when the variances along each view are perfectly known. The dash-dotted curve (T) is the performance achieved with a rank-one approximation of ${\bm{\mathsfit{X}}}$seddik_nested_2023. The dashed curve (U) is the performance predicted by Theorem \ref{['thm:perf']} with the unfolding approach.

Theorems & Definitions (30)

  • Example
  • Lemma 1: stein_estimation_1981
  • Theorem 1: Limiting Spectral Distribution
  • proof
  • Theorem 2: Spike Behavior
  • proof
  • Proposition 1: Phase Transition
  • Remark
  • Theorem 3: Limiting Spectral Distribution
  • Theorem 4: Spike Behavior
  • ...and 20 more