Revisit CP Tensor Decomposition: Statistical Optimality and Fast Convergence
Runshi Tang, Julien Chhor, Olga Klopp, Anru R. Zhang
TL;DR
This work advances the theoretical understanding of CP tensor decomposition under a noisy signal-plus-noise model by delivering non-asymptotic, minimax-optimal error bounds for ALS with a statistically justified warm start. It introduces TASD, a robust Tucker-based initialization followed by simultaneous diagonalization, and proves that TASD-ALS yields statistically consistent recovery with improved stability in high-dimensional, noisy regimes. The authors show a two-phase convergence for ALS—an initial quadratic phase transitioning to linear refinement—and demonstrate rank-one optimality within one to two iterations under appropriate initialization. Complementing theory, extensive simulations validate TASD-ALS, reveal threshold phenomena with signal strength, and illustrate faster convergence in near-orthogonal regimes, while identifying limits in high-rank settings and potential extensions to structured tensor models.
Abstract
Canonical Polyadic (CP) tensor decomposition is a fundamental technique for analyzing high-dimensional tensor data. While the Alternating Least Squares (ALS) algorithm is widely used for computing CP decomposition due to its simplicity and empirical success, its theoretical foundation, particularly regarding statistical optimality and convergence behavior, remain underdeveloped, especially in noisy, non-orthogonal, and higher-rank settings. In this work, we revisit CP tensor decomposition from a statistical perspective and provide a comprehensive theoretical analysis of ALS under a signal-plus-noise model. We establish non-asymptotic, minimax-optimal error bounds for tensors of general order, dimensions, and rank, assuming suitable initialization. To enable such initialization, we propose Tucker-based Approximation with Simultaneous Diagonalization (TASD), a robust method that improves stability and accuracy in noisy regimes. Combined with ALS, TASD yields a statistically consistent estimator. We further analyze the convergence dynamics of ALS, identifying a two-phase pattern-initial quadratic convergence followed by linear refinement. We further show that in the rank-one setting, ALS with an appropriately chosen initialization attains optimal error within just one or two iterations.
