Joint Majorization-Minimization for Nonnegative CP and Tucker Decompositions under $β$-Divergences: Unfolding-Free Updates

Valentin Leplat

Joint Majorization-Minimization for Nonnegative CP and Tucker Decompositions under $β$-Divergences: Unfolding-Free Updates

Valentin Leplat

TL;DR

This paper addresses efficient, unfolding-free optimization for nonnegative CP and Tucker tensor decompositions under the β-divergence losses. The authors derive contraction-based majorization-minimization updates that express all numerators and denominators as tensor contractions, enabling direct einsum-style implementations without explicit unfoldings. Their main advance is a joint majorization strategy that builds a single surrogate at a reference point and reduces it via cheap inner updates while reusing cached reference quantities, yielding substantial practical speedups. They prove the majorizers are tight, establish monotonic descent of the objective, and demonstrate convergence of the objective values, with BSUM-based arguments discussed for limit points. Extensive experiments on synthetic data and a real Uber pickups tensor show significant wall-clock-time gains over unfolding-based baselines and competitive performance against an einsum-factorization framework, highlighting the approach’s scalability and potential for large-scale, nonnegative multilinear factorization.

Abstract

We study majorization-minimization methods for nonnegative tensor decompositions under the $β$-divergence family, focusing on nonnegative CP and Tucker models. Our aim is to avoid explicit mode unfoldings and large auxiliary matrices by deriving separable surrogates whose multiplicative updates can be implemented using only tensor contractions (einsum-style operations). We present both classical block-MM updates in contraction-only form and a joint majorization strategy, inspired by joint MM for matrix $β$-NMF, that reuses cached reference quantities across inexpensive inner updates. We prove tightness of the proposed majorizers, establish monotonic decrease of the objective, and show convergence of the sequence of objective values; we also discuss how BSUM theory applies to the block-MM scheme for analyzing limit points. Finally, experiments on synthetic tensors and the Uber spatiotemporal count tensor demonstrate substantial speedups over unfolding-based baselines and a recent einsum-factorization framework.

Joint Majorization-Minimization for Nonnegative CP and Tucker Decompositions under $β$-Divergences: Unfolding-Free Updates

TL;DR

Abstract

We study majorization-minimization methods for nonnegative tensor decompositions under the

-divergence family, focusing on nonnegative CP and Tucker models. Our aim is to avoid explicit mode unfoldings and large auxiliary matrices by deriving separable surrogates whose multiplicative updates can be implemented using only tensor contractions (einsum-style operations). We present both classical block-MM updates in contraction-only form and a joint majorization strategy, inspired by joint MM for matrix

-NMF, that reuses cached reference quantities across inexpensive inner updates. We prove tightness of the proposed majorizers, establish monotonic decrease of the objective, and show convergence of the sequence of objective values; we also discuss how BSUM theory applies to the block-MM scheme for analyzing limit points. Finally, experiments on synthetic tensors and the Uber spatiotemporal count tensor demonstrate substantial speedups over unfolding-based baselines and a recent einsum-factorization framework.

Paper Structure (109 sections, 9 theorems, 117 equations, 3 figures, 2 algorithms)

This paper contains 109 sections, 9 theorems, 117 equations, 3 figures, 2 algorithms.

Keywords.
Introduction
Contributions.
Paper organization.
Background and Related Work
$\beta$-divergences and majorization--minimization.
Tensor decompositions under divergence losses.
Einsum-based multiplicative updates beyond CP/Tucker.
Joint MM for $\beta$-NMF and extension to multilinear models.
Preliminaries
Notation
Mode-$n$ product.
Model reconstruction.
$\beta$-divergence
Models
...and 94 more sections

Key Result

Proposition 1

If $\theta^{+}\in\arg\min_\theta G(\theta\mid \tilde{\theta})$, then $F(\theta^{+})\le F(\tilde{\theta})$.

Figures (3)

Figure 1: Synthetic CP benchmark (order $4$, size $80{\times}70{\times}60{\times}50$, rank $R=10$). Each row corresponds to one $\beta\in\{0.5,1,1.5\}$ and shows the mean normalized loss $\bar{D}_\beta(X,\widehat{X})$ versus iteration (left) and wall-clock CPU time (right), averaged over $5$ random seeds (shaded band: variability). For runtime fairness, NumPy/BLAS is restricted to one thread; NNEinFact is additionally reported for Torch CPU threads $1/4/8$ (see legend).
Figure 2: Synthetic Tucker benchmark (order $4$, size $80{\times}70{\times}60{\times}50$, multilinear ranks $(10,9,8,7)$). Each row corresponds to one $\beta\in\{0.5,1,1.5\}$ and shows the mean normalized loss $\bar{D}_\beta(X,\widehat{X})$ versus iteration (left) and wall-clock CPU time (right), averaged over $5$ random seeds (shaded band: variability). For runtime fairness, NumPy/BLAS is restricted to one thread; NNEinFact is additionally reported for Torch CPU threads $1/4/8$ (see legend).
Figure 3: Uber pickups tensor ($27\times 24\times 7\times 100\times 100$): Tucker fit with ranks $(10,10,5,10,10)$. Each row corresponds to one $\beta\in\{1/2,1,3/2\}$ and shows the normalized objective (mean $\beta$-divergence per entry) versus outer iteration (left) and wall-clock CPU time (right). All methods are run in a single-thread CPU configuration (NumPy/BLAS and PyTorch restricted to one thread).

Theorems & Definitions (25)

Definition 1: $\beta$-divergence
Definition 2: Majorization-minimization surrogate
Proposition 1: Monotonic descent
Remark 1: When does a contraction produce an $I\times R\times R$ tensor?
Remark 2: Exponent $\gamma(\beta)$
Theorem 1: CP block multiplicative update
proof
Theorem 2: Tucker block multiplicative updates
proof
Proposition 2: Inner updates decrease the joint surrogate
...and 15 more

Joint Majorization-Minimization for Nonnegative CP and Tucker Decompositions under $β$-Divergences: Unfolding-Free Updates

TL;DR

Abstract

Joint Majorization-Minimization for Nonnegative CP and Tucker Decompositions under $β$-Divergences: Unfolding-Free Updates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (25)