Table of Contents
Fetching ...

The power of small initialization in noisy low-tubal-rank tensor recovery

ZHiyu Liu, Haobo Geng, Xudong Wang, Yandong Tang, Zhi Han, Yao Wang

Abstract

We study the problem of recovering a low-tubal-rank tensor $\mathcal{X}\_\star\in \mathbb{R}^{n \times n \times k}$ from noisy linear measurements under the t-product framework. A widely adopted strategy involves factorizing the optimization variable as $\mathcal{U} * \mathcal{U}^\top$, where $\mathcal{U} \in \mathbb{R}^{n \times R \times k}$, followed by applying factorized gradient descent (FGD) to solve the resulting optimization problem. Since the tubal-rank $r$ of the underlying tensor $\mathcal{X}_\star$ is typically unknown, this method often assumes $r < R \le n$, a regime known as over-parameterization. However, when the measurements are corrupted by some dense noise (e.g., Gaussian noise), FGD with the commonly used spectral initialization yields a recovery error that grows linearly with the over-estimated tubal-rank $R$. To address this issue, we show that using a small initialization enables FGD to achieve a nearly minimax optimal recovery error, even when the tubal-rank $R$ is significantly overestimated. Using a four-stage analytic framework, we analyze this phenomenon and establish the sharpest known error bound to date, which is independent of the overestimated tubal-rank $R$. Furthermore, we provide a theoretical guarantee showing that an easy-to-use early stopping strategy can achieve the best known result in practice. All these theoretical findings are validated through a series of simulations and real-data experiments.

The power of small initialization in noisy low-tubal-rank tensor recovery

Abstract

We study the problem of recovering a low-tubal-rank tensor from noisy linear measurements under the t-product framework. A widely adopted strategy involves factorizing the optimization variable as , where , followed by applying factorized gradient descent (FGD) to solve the resulting optimization problem. Since the tubal-rank of the underlying tensor is typically unknown, this method often assumes , a regime known as over-parameterization. However, when the measurements are corrupted by some dense noise (e.g., Gaussian noise), FGD with the commonly used spectral initialization yields a recovery error that grows linearly with the over-estimated tubal-rank . To address this issue, we show that using a small initialization enables FGD to achieve a nearly minimax optimal recovery error, even when the tubal-rank is significantly overestimated. Using a four-stage analytic framework, we analyze this phenomenon and establish the sharpest known error bound to date, which is independent of the overestimated tubal-rank . Furthermore, we provide a theoretical guarantee showing that an easy-to-use early stopping strategy can achieve the best known result in practice. All these theoretical findings are validated through a series of simulations and real-data experiments.
Paper Structure (39 sections, 22 theorems, 190 equations, 15 figures, 7 tables, 3 algorithms)

This paper contains 39 sections, 22 theorems, 190 equations, 15 figures, 7 tables, 3 algorithms.

Key Result

Theorem 1

Let $\bm{\mathcal{Y}}\in\mathbb{R}^{m\times n \times k}$, then it can be factored as $\bm{\mathcal{Y}}=\bm{\mathcal{V}}_{\bm{\mathcal{Y}}} * \bm{\mathcal{S}}_{\bm{\mathcal{Y}}} * \bm{\mathcal{W}}_{\bm{\mathcal{Y}}}^\top$ where $\bm{\mathcal{V}}_{\bm{\mathcal{Y}}}\in\mathbb{R}^{m \times m \times k}$,

Figures (15)

  • Figure 1: Comparison of training and testing errors for Problem (\ref{['equ:3']}) using FGD with spectral vs. small initialization. The ground-truth tensor has tubal-rank $r=2$, overestimated rank $R=4$, size $n=20$, $k=3$, $m=5kr(2n-r)$ measurements, and noise $\sigma=10^{-3}$. Spectral initialization follows liu2024low, while small initialization uses a near-zero starting point. Training error is $\frac{1}{4m}||\bm{y}-\bm{\mathfrak{M}}(\bm{\mathcal{U}}*\bm{\mathcal{U}}^\top)||^2$, and testing error is $||\bm{\mathcal{U}}*\bm{\mathcal{U}}^\top - \bm{\mathcal{X}}_\star||_F^2/||\bm{\mathcal{X}}_\star||_F^2$. “Baseline” denotes recovery under exact rank $R=r$. Insets show early (first 500 iterations) vs. full error curves.
  • Figure 2: Performance comparison under varying $r$, $\sigma$, $n$, and $m$. Subfigure (a) illustrates the recovery error of all methods under different over-rank values $R$, with parameters set as $m = 10nrk$, $n = 30$, $\sigma = 10^{-3}$, $\eta = 0.1$, and $T = 5000$. Subfigure (b) illustrates the error under varying noise levels $\sigma$, with $m = 10nrk$, $n = 30$, $R = 3r$, $\eta = 0.1$, and $T = 5000$. Subfigure (c) illustrates the error as the problem dimension $n$ changes, where $m = 10nrk$, $R = 3r$, $\eta = 0.1$, $T = 20000$, and $\sigma = 10^{-3}$. Subfigure (d) illustrates the performance under different numbers of measurements $C_m$, with $m = 2C_m nrk$, $n = 30$, $R = 3r$, $\eta = 0.01$, $T = 20000$, and $\sigma = 10^{-3}$.
  • Figure 3: Validation of the algorithm with $m = 10nrk$, $R = 3r$, $n = 30$, $\sigma = 10^{-3}$, $\eta = 0.1$. (a) Validation loss vs. RSE, with the blue dot marking the minimum. (b) Error of the validation-based method compared with the minimum errors of baseline and small-initialization under varying $m_{\text{train}}$.
  • Figure 4: Validation of the sensitivity of FGD to different tubal-ranks.
  • Figure 5: Validation of the four-phase convergence analysis in Section 3.3. The left panel shows the first 1,000 iterations; the right panel shows the full 10,000 iterations. The orange curve corresponds to the orange axis on the right, and the blue curve corresponds to the blue axis on the left. Parameter settings: $n=10$, $k=3$, $r=2$, $R=10$, $m=5knR$, $\eta=0.1$, noise standard deviation $\sigma=0.01$, and initialization scale $\alpha=10^{-7}$.
  • ...and 10 more figures

Theorems & Definitions (54)

  • Theorem 1: t-SVD kilmer2011factorization
  • Definition 1: t-RIP zhang2021tensor
  • Theorem 2
  • Remark 1
  • Theorem 3: Minimax error
  • Corollary 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • ...and 44 more