Performance Gaps in Multi-view Clustering under the Nested Matrix-Tensor Model
Hugo Lebeau, Mohamed El Amine Seddik, José Henrique de Morais Goulart
TL;DR
This work analyzes the performance gap between tensor-based and unfolding-based spectral methods for a nested matrix-tensor model used in multi-view clustering. By applying random matrix theory, it derives the limiting spectral distributions for unfoldings and identifies precise spike-detection thresholds, including a BBP-type transition characterized by $\rho_T = \lim \frac{\beta_T^2 n_T}{\sqrt{n_1 n_2 n_3}}$ and the regime $\beta_T = \Theta(n_T^{1/4})$ for nontrivial recovery in unfoldings. It shows that the tensor-based rank-one estimator can achieve recovery at $\Theta(1)$ SNR, but is NP-hard to compute, whereas unfolding requires stronger scaling to detect the signal, yielding a quantifiable gap in achievable clustering accuracy. The results are corroborated by simulations, and the authors discuss a practical pathway to leverage unfolding for initialization in tensor methods. Overall, the paper clarifies when matrix unfoldings suffice and when full tensor spectral methods provide a tangible performance advantage for multi-view clustering.
Abstract
We study the estimation of a planted signal hidden in a recently introduced nested matrix-tensor model, which is an extension of the classical spiked rank-one tensor model, motivated by multi-view clustering. Prior work has theoretically examined the performance of a tensor-based approach, which relies on finding a best rank-one approximation, a problem known to be computationally hard. A tractable alternative approach consists in computing instead the best rank-one (matrix) approximation of an unfolding of the observed tensor data, but its performance was hitherto unknown. We quantify here the performance gap between these two approaches, in particular by deriving the precise algorithmic threshold of the unfolding approach and demonstrating that it exhibits a BBP-type transition behavior. This work is therefore in line with recent contributions which deepen our understanding of why tensor-based methods surpass matrix-based methods in handling structured tensor data.
