Table of Contents
Fetching ...

CP Degeneracy in Tensor Regression

Ya Zhou, Raymond K. W. Wong, Kejun He

TL;DR

The paper addresses CP degeneracy in tensor regression, where the CP-structured parameter space is not closed and the minimal loss may be unattainable. It shows that degeneracy drives CP parameters to diverge along iterative optimization paths and that a CP-level penalty $\lambda g(\bm{\theta})$ can restore well-posedness, providing nonasymptotic and asymptotic guarantees without requiring the existence of a best low-rank approximation. The results establish conditions under which the penalized estimator is attainable and achieves favorable error rates, and they highlight that penalizing the coefficient tensor $\mathbf{A}$ directly may fail. Numerical experiments corroborate that CP-level penalties reduce degeneracy, yielding more stable and accurate tensor regression in high-dimensional settings.

Abstract

Tensor linear regression is an important and useful tool for analyzing tensor data. To deal with high dimensionality, CANDECOMP/PARAFAC (CP) low-rank constraints are often imposed on the coefficient tensor parameter in the (penalized) $M$-estimation. However, we show that the corresponding optimization may not be attainable, and when this happens, the estimator is not well-defined. This is closely related to a phenomenon, called CP degeneracy, in low-rank tensor approximation problems. In this article, we provide useful results of CP degeneracy in tensor regression problems. In addition, we provide a general penalized strategy as a solution to overcome CP degeneracy. The asymptotic properties of the resulting estimation are also studied. Numerical experiments are conducted to illustrate our findings.

CP Degeneracy in Tensor Regression

TL;DR

The paper addresses CP degeneracy in tensor regression, where the CP-structured parameter space is not closed and the minimal loss may be unattainable. It shows that degeneracy drives CP parameters to diverge along iterative optimization paths and that a CP-level penalty can restore well-posedness, providing nonasymptotic and asymptotic guarantees without requiring the existence of a best low-rank approximation. The results establish conditions under which the penalized estimator is attainable and achieves favorable error rates, and they highlight that penalizing the coefficient tensor directly may fail. Numerical experiments corroborate that CP-level penalties reduce degeneracy, yielding more stable and accurate tensor regression in high-dimensional settings.

Abstract

Tensor linear regression is an important and useful tool for analyzing tensor data. To deal with high dimensionality, CANDECOMP/PARAFAC (CP) low-rank constraints are often imposed on the coefficient tensor parameter in the (penalized) -estimation. However, we show that the corresponding optimization may not be attainable, and when this happens, the estimator is not well-defined. This is closely related to a phenomenon, called CP degeneracy, in low-rank tensor approximation problems. In this article, we provide useful results of CP degeneracy in tensor regression problems. In addition, we provide a general penalized strategy as a solution to overcome CP degeneracy. The asymptotic properties of the resulting estimation are also studied. Numerical experiments are conducted to illustrate our findings.

Paper Structure

This paper contains 20 sections, 12 theorems, 112 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Suppose where $\mathcal{S}$ is the set of solutions of tensor linear model as defined in eqn:example:3:defS. If $R_b \le R < R_m$, then the optimizations eqn:opt_openset and eqn:def:f do not have a solution.

Figures (2)

  • Figure 1: Examples of numerical experiments. The figures are the magnitude $\mathcal{M}(\bm{\theta}_t)$ versus the iteration number $t$. The top corresponds to Case 1a with $(n, p, R, R_0)=(200,5,2,3)$, and the bottom corresponds to Case 2a with $(n, p, R, R_0)=(100,5,3,3)$. The results with respect to \ref{['eqn:def:f']} and \ref{['eq:opt_ridge']} with $\alpha = 0.001, 0.01, 0.1$ are grouped together in the first column using different colors. The results with respect to \ref{['eqn:penalized:CPlevel']} with $\lambda = 0.001, 0.01, 0.1$ are depicted in the second column using different colors. .
  • Figure 2: An example of a numerical experiment for Case 1a with $(n, p, R, R_0)=(200,5,2,3)$. The top are the plots of the magnitude $\mathcal{M}(\bm{\theta}_t)$ versus the iteration number $t$ for $t\ge 50000$. The bottom are plots of $\{(\mathcal{M}(\bm{\theta}_{t+100}) - \mathcal{M}(\bm{\theta}_{t}))/100\}$ and the fitted values $h_{\hat{a}, \hat{b},\hat{c}}(t)$ versus $t$. The results with respect to \ref{['eqn:def:f']} and \ref{['eq:opt_ridge']} with $\alpha = 0.001, 0.01, 0.1$ are grouped together and presented in the first column using different colors and point shapes. The results with respect to \ref{['eqn:penalized:CPlevel']} with $\lambda = 0.001, 0.01, 0.1$ are depicted in the second column using different colors and point shapes. The dotted lines with the same color in the bottom are the corresponding fitted curves using \ref{['eqn:def:habc']}.

Theorems & Definitions (30)

  • Example 1
  • Lemma 1
  • Example 2
  • Example 3
  • Theorem 1
  • Theorem 2
  • Definition 1: Sub-Gaussian random variables vershynin2018high
  • Definition 2: Sub-Gaussian random vectors vershynin2018high
  • Definition 3: Gaussian width
  • Theorem 3
  • ...and 20 more