Table of Contents
Fetching ...

Low-rank optimization on Tucker tensor varieties

Bin Gao, Renfeng Peng, Ya-xiang Yuan

TL;DR

This paper addresses optimization over Tucker tensor varieties $\mathcal{M}_{\le\mathbf{r}}$, where the Tucker rank is bounded, by deriving an explicit tangent-cone description and leveraging it to design geometry-guided gradient methods. It introduces a gradient-related approximate projection (GRAP) and a retraction-free variant (rfGRAP), along with a Tucker-rank adaptive method (TRAM) that can increase or decrease rank during iterations, all with convergence guarantees via the Łojasiewicz framework. Central to the approach is an explicit characterization of the tangent cone and practical projection operators, including an approximate projection that avoids expensive exact metric projections while preserving descent. Empirical results on tensor completion demonstrate that GRAP, rfGRAP, and especially TRAM outperform state-of-the-art Tucker-based methods across varying rank choices and data regimes, with TRAM effectively identifying appropriate ranks in practice.

Abstract

In the realm of tensor optimization, the low-rank Tucker decomposition is crucial for reducing the number of parameters and for saving storage. We explore the geometry of Tucker tensor varieties -- the set of tensors with bounded Tucker rank -- which is notably more intricate than the well-explored matrix varieties. We give an explicit parametrization of the tangent cone of Tucker tensor varieties and leverage its geometry to develop provable gradient-related line-search methods for optimization on Tucker tensor varieties. To the best of our knowledge, this is the first work concerning geometry and optimization on Tucker tensor varieties. In practice, low-rank tensor optimization suffers from the difficulty of choosing a reliable rank parameter. To this end, we incorporate the established geometry and propose a Tucker rank-adaptive method that aims to identify an appropriate rank with guaranteed convergence. Numerical experiments on tensor completion reveal that the proposed methods are in favor of recovering performance over other state-of-the-art methods. The rank-adaptive method performs the best across various rank parameter selections and is indeed able to find an appropriate rank.

Low-rank optimization on Tucker tensor varieties

TL;DR

This paper addresses optimization over Tucker tensor varieties , where the Tucker rank is bounded, by deriving an explicit tangent-cone description and leveraging it to design geometry-guided gradient methods. It introduces a gradient-related approximate projection (GRAP) and a retraction-free variant (rfGRAP), along with a Tucker-rank adaptive method (TRAM) that can increase or decrease rank during iterations, all with convergence guarantees via the Łojasiewicz framework. Central to the approach is an explicit characterization of the tangent cone and practical projection operators, including an approximate projection that avoids expensive exact metric projections while preserving descent. Empirical results on tensor completion demonstrate that GRAP, rfGRAP, and especially TRAM outperform state-of-the-art Tucker-based methods across varying rank choices and data regimes, with TRAM effectively identifying appropriate ranks in practice.

Abstract

In the realm of tensor optimization, the low-rank Tucker decomposition is crucial for reducing the number of parameters and for saving storage. We explore the geometry of Tucker tensor varieties -- the set of tensors with bounded Tucker rank -- which is notably more intricate than the well-explored matrix varieties. We give an explicit parametrization of the tangent cone of Tucker tensor varieties and leverage its geometry to develop provable gradient-related line-search methods for optimization on Tucker tensor varieties. To the best of our knowledge, this is the first work concerning geometry and optimization on Tucker tensor varieties. In practice, low-rank tensor optimization suffers from the difficulty of choosing a reliable rank parameter. To this end, we incorporate the established geometry and propose a Tucker rank-adaptive method that aims to identify an appropriate rank with guaranteed convergence. Numerical experiments on tensor completion reveal that the proposed methods are in favor of recovering performance over other state-of-the-art methods. The rank-adaptive method performs the best across various rank parameter selections and is indeed able to find an appropriate rank.
Paper Structure (26 sections, 10 theorems, 84 equations, 4 figures, 2 algorithms)

This paper contains 26 sections, 10 theorems, 84 equations, 4 figures, 2 algorithms.

Key Result

proposition thmcounterproposition

Given $\mathbf{X}\in\mathbb{R}_{\underline{r}}^{m\times n}$ with $\underline{r}\leq r$. A thin SVD of $\mathbf{X}$ is $\mathbf{X}=\mathbf{U}\Sigma\mathbf{V}^\mathsf{T}$, where $\mathbf{U}\in\mathrm{St}(\underline{r},m)$, $\mathbf{V}\in\mathrm{St}(\underline{r},n)$ and $\Sigma=\mathop{\mathrm{diag}}\ for any $\mathbf{U}^\perp\in\mathrm{St}(m-\underline{r},m)$ with $\mathrm{span}(\mathbf{U}^\perp)=\

Figures (4)

  • Figure 1: Illustration of an element in $\mathop{\mathrm{T}}\nolimits_{\mathbf{X}}\!\mathbb{R}^{m\times n}_{\leq r}$ with parameters $\mathbf{U}_1\in\mathrm{St}(r-\underline{r},m),\mathbf{V}_1\in\mathrm{St}(r-\underline{r},n),\mathbf{U}_2\in\mathrm{St}(m-r,m),\mathbf{V}_2\in\mathrm{St}(n-r,n)$ satisfying $[\mathbf{U}\ \mathbf{U}_1\ \mathbf{U}_2]\in\mathcal{O}(m)$ and $[\mathbf{V}\ \mathbf{V}_1\ \mathbf{V}_2]\in\mathcal{O}(n)$
  • Figure 2: Tucker decomposition of a third-order tensor
  • Figure 3: Illustration of a tangent vector in $\mathop{\mathrm{T}}\nolimits_\mathcal{X}\!\mathcal{M}_{{\underline{\mathbf{r}}}}$ at $\mathcal{X}=\mathcal{G}\times_{k=1}^d\mathbf{U}_k$ for $d=3$. $\underline{G}_k:=\mathcal{G}\times_k\dot{\mathbf{R}}_k$ with arbitrary $\dot{\mathbf{R}}_k\in\mathbb{R}^{(n_k-\underline{r}_k)\times \underline{r}_k}$
  • Figure 4: Illustration of an element in $\mathop{\mathrm{T}}\nolimits_\mathcal{X}\!\mathcal{M}_{\leq\mathbf{r}}$ at $\mathcal{X}=\mathcal{G}\times_{k=1}^d\mathbf{U}_k$ for $d=3$. $G_k:=\mathcal{G}\times_k\mathbf{R}_{k,2}$ with parameters $\mathbf{R}_{k,2}\in\mathbb{R}^{(n_k-r_k)\times\underline{r}_k}$, $\mathbf{U}_{k,1}\in\mathrm{St}(r_k-\underline{r}_k,n_k)$ and $\mathbf{U}_{k,2}\in\mathrm{St}(n_k-r_k,n_k)$ satisfying $[\mathbf{U}_k\ \mathbf{U}_{k,1}\ \mathbf{U}_{k,2}]\in\mathcal{O}(n_k)$ for $k\in[d]$

Theorems & Definitions (20)

  • definition thmcounterdefinition
  • proposition thmcounterproposition
  • remark thmcounterremark
  • definition thmcounterdefinition: Tucker decomposition
  • proposition thmcounterproposition
  • proof
  • lemma thmcounterlemma
  • theorem 1
  • proof
  • corollary thmcountercorollary
  • ...and 10 more