Table of Contents
Fetching ...

Revisit CP Tensor Decomposition: Statistical Optimality and Fast Convergence

Runshi Tang, Julien Chhor, Olga Klopp, Anru R. Zhang

TL;DR

This work advances the theoretical understanding of CP tensor decomposition under a noisy signal-plus-noise model by delivering non-asymptotic, minimax-optimal error bounds for ALS with a statistically justified warm start. It introduces TASD, a robust Tucker-based initialization followed by simultaneous diagonalization, and proves that TASD-ALS yields statistically consistent recovery with improved stability in high-dimensional, noisy regimes. The authors show a two-phase convergence for ALS—an initial quadratic phase transitioning to linear refinement—and demonstrate rank-one optimality within one to two iterations under appropriate initialization. Complementing theory, extensive simulations validate TASD-ALS, reveal threshold phenomena with signal strength, and illustrate faster convergence in near-orthogonal regimes, while identifying limits in high-rank settings and potential extensions to structured tensor models.

Abstract

Canonical Polyadic (CP) tensor decomposition is a fundamental technique for analyzing high-dimensional tensor data. While the Alternating Least Squares (ALS) algorithm is widely used for computing CP decomposition due to its simplicity and empirical success, its theoretical foundation, particularly regarding statistical optimality and convergence behavior, remain underdeveloped, especially in noisy, non-orthogonal, and higher-rank settings. In this work, we revisit CP tensor decomposition from a statistical perspective and provide a comprehensive theoretical analysis of ALS under a signal-plus-noise model. We establish non-asymptotic, minimax-optimal error bounds for tensors of general order, dimensions, and rank, assuming suitable initialization. To enable such initialization, we propose Tucker-based Approximation with Simultaneous Diagonalization (TASD), a robust method that improves stability and accuracy in noisy regimes. Combined with ALS, TASD yields a statistically consistent estimator. We further analyze the convergence dynamics of ALS, identifying a two-phase pattern-initial quadratic convergence followed by linear refinement. We further show that in the rank-one setting, ALS with an appropriately chosen initialization attains optimal error within just one or two iterations.

Revisit CP Tensor Decomposition: Statistical Optimality and Fast Convergence

TL;DR

This work advances the theoretical understanding of CP tensor decomposition under a noisy signal-plus-noise model by delivering non-asymptotic, minimax-optimal error bounds for ALS with a statistically justified warm start. It introduces TASD, a robust Tucker-based initialization followed by simultaneous diagonalization, and proves that TASD-ALS yields statistically consistent recovery with improved stability in high-dimensional, noisy regimes. The authors show a two-phase convergence for ALS—an initial quadratic phase transitioning to linear refinement—and demonstrate rank-one optimality within one to two iterations under appropriate initialization. Complementing theory, extensive simulations validate TASD-ALS, reveal threshold phenomena with signal strength, and illustrate faster convergence in near-orthogonal regimes, while identifying limits in high-rank settings and potential extensions to structured tensor models.

Abstract

Canonical Polyadic (CP) tensor decomposition is a fundamental technique for analyzing high-dimensional tensor data. While the Alternating Least Squares (ALS) algorithm is widely used for computing CP decomposition due to its simplicity and empirical success, its theoretical foundation, particularly regarding statistical optimality and convergence behavior, remain underdeveloped, especially in noisy, non-orthogonal, and higher-rank settings. In this work, we revisit CP tensor decomposition from a statistical perspective and provide a comprehensive theoretical analysis of ALS under a signal-plus-noise model. We establish non-asymptotic, minimax-optimal error bounds for tensors of general order, dimensions, and rank, assuming suitable initialization. To enable such initialization, we propose Tucker-based Approximation with Simultaneous Diagonalization (TASD), a robust method that improves stability and accuracy in noisy regimes. Combined with ALS, TASD yields a statistically consistent estimator. We further analyze the convergence dynamics of ALS, identifying a two-phase pattern-initial quadratic convergence followed by linear refinement. We further show that in the rank-one setting, ALS with an appropriately chosen initialization attains optimal error within just one or two iterations.

Paper Structure

This paper contains 64 sections, 18 theorems, 297 equations, 6 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

In model eq_main_model_R1, let $\hat{a}_k^{(0)}$ be calculated by eq_initialization_formula_R1, $\hat{a}_k^{(t)}$ be calculated iteratively by eq_update_formula_R1, and define $\varepsilon_t$ by eq_error_formula_R1. There exist absolute constants $C_1$, $C_2$ and $C_3$ such that if $p_{\min} \geq C_ Moreover, the following holds for $t\geq2$ with probability at least $1-\exp$-C_2 p_$$: If we addi

Figures (6)

  • Figure 1: Loss comparison over varying rank $R$ and noise level $\sigma$.
  • Figure 2: Scatter (log‑scale) of loss versus $\alpha$ for TASD-ALS; solid line shows median.
  • Figure 3: Propoertion of number of iterations required by R1-ALS to yield $\varepsilon_t < 0.05$. "0 iteration" corresponds to the initialization.
  • Figure 4: Propoertion of number of iterations required by TASD-ALS to yield $\varepsilon_t < 0.05$. "0 iteration" corresponds to the initialization.
  • Figure 5: Loss comparison over varying rank $R$ and noise level $\sigma$.
  • ...and 1 more figures

Theorems & Definitions (33)

  • Theorem 1: Rank-One Case: Iteration Error
  • Remark 1: Theory of Rank-One Tensor Decomposition
  • Remark 2: TASD vs. Existing Work
  • Theorem 2: Convergence of ALS
  • Theorem 3: Lower Bound
  • Corollary 1: Estimation Error of $\widehat{|\lambda_r|}$ and $\hat{{\mathbf{X}}}$
  • Remark 3: Comparison with Existing Work
  • Proposition 1: TASD: Noiseless Setting
  • Theorem 4: Theory of TASD
  • Theorem 5: Estimation Error of HOOI for General Order-$d$ Tensor Decomposition
  • ...and 23 more