Table of Contents
Fetching ...

A quadratically convergent proximal algorithm for nonnegative tensor decomposition

Nico Vervliet, Andreas Themelis, Panagiotis Patrinos, Lieven De Lathauwer

Abstract

The decomposition of tensors into simple rank-1 terms is key in a variety of applications in signal processing, data analysis and machine learning. While this canonical polyadic decomposition (CPD) is unique under mild conditions, including prior knowledge such as nonnegativity can facilitate interpretation of the components. Inspired by the effectiveness and efficiency of Gauss-Newton (GN) for unconstrained CPD, we derive a proximal, semismooth GN type algorithm for nonnegative tensor factorization. If the algorithm converges to the global optimum, we show that $Q$-quadratic convergence can be obtained in the exact case. Global convergence is achieved via backtracking on the forward-backward envelope function. The $Q$-quadratic convergence is verified experimentally, and we illustrate that using the GN step significantly reduces number of (expensive) gradient computations compared to proximal gradient descent.

A quadratically convergent proximal algorithm for nonnegative tensor decomposition

Abstract

The decomposition of tensors into simple rank-1 terms is key in a variety of applications in signal processing, data analysis and machine learning. While this canonical polyadic decomposition (CPD) is unique under mild conditions, including prior knowledge such as nonnegativity can facilitate interpretation of the components. Inspired by the effectiveness and efficiency of Gauss-Newton (GN) for unconstrained CPD, we derive a proximal, semismooth GN type algorithm for nonnegative tensor factorization. If the algorithm converges to the global optimum, we show that -quadratic convergence can be obtained in the exact case. Global convergence is achieved via backtracking on the forward-backward envelope function. The -quadratic convergence is verified experimentally, and we illustrate that using the GN step significantly reduces number of (expensive) gradient computations compared to proximal gradient descent.

Paper Structure

This paper contains 9 sections, 5 theorems, 24 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

Let $\bm{x}^\star$ be such that $F(\bm{x}^\star)=\bm{0}$, and suppose that all matrices in $\hat{J}\mathcal{R}_\gamma(\bm{x}^\star)$ are nonsingular.all matrices in $\hat{J}\mathcal{R}_\gamma(\bm{x}^\star)$ are nonsingular. Then, there exists $\varepsilon>0$ such that the iterations with $\hat{{\mathbf{H}}}_k$ being any element of $\hat{J}\mathcal{R}_\gamma(\bm{x}^k)$, are $Q$-quadratically conve

Figures (2)

  • Figure 1: Up to $Q$-quadratic convergence can be achieved near the global optimum if an exact, unique solutions exists ($F(\bm{x}^{\star})=\bm{0}$). The histogram shows the slope for the penultimate iteration for 500 experiments. The convergence curves for ten randomly chosen experiments have a slope close to 2. Results shown for a rank-5, $10\times10\times10$ tensor with a unique and nonnegative CPD, starting close to the global optimum.
  • Figure 2: By using (approximate) second-order information as in the proposed algorithm, fewer gradients are computed compared to proximal gradient descent. Histograms created using 250 experiments.

Theorems & Definitions (9)

  • Theorem 1: Local quadratic convergence
  • proof
  • Lemma 2: Kernel of Gramian
  • proof
  • Theorem 3
  • proof
  • Lemma 4: Basic properties of the FBE
  • proof
  • Theorem 5: Convergence of \ref{['alg:PANOC']}