Efficient Leverage Score Sampling for Tensor Train Decomposition

Vivek Bharadwaj; Beheshteh T. Rakhshan; Osman Asif Malik; Guillaume Rabusseau

Efficient Leverage Score Sampling for Tensor Train Decomposition

Vivek Bharadwaj, Beheshteh T. Rakhshan, Osman Asif Malik, Guillaume Rabusseau

TL;DR

The paper tackles the computational bottleneck of TT-ALS for high-order tensors by introducing rTT-ALS, a sampling-based TT-ALS framework that uses exact leverage-score sampling guided by a data-structure. By maintaining the TT in canonical form, the method makes the left-right TT core chain orthogonal, enabling Φ = I and enabling efficient sampling from the squared-row-norm distribution with construction time $O\left(\sum_{n=1}^j I_n R_{n-1} R_n^2\right)$ and per-sample time $O\left(\sum_{k=1}^j \log\left(I_k R_{k-1}/R_k\right) R_k^2\right)$ (=$O(j R^2 \log I)$ in the uniform case). The authors provide a rigorous proof outline for the sampling procedure and demonstrate empirically that rTT-ALS achieves up to about 16× speedups over non-randomized TT-ALS (and competitive accuracy) on both dense and sparse tensors, including massive real-world datasets. This approach enables scalable TT decompositions for large-scale ML and physics applications and suggests extensions to other tensor-network architectures that exploit canonical forms for efficient sampling.

Abstract

Tensor Train~(TT) decomposition is widely used in the machine learning and quantum physics communities as a popular tool to efficiently compress high-dimensional tensor data. In this paper, we propose an efficient algorithm to accelerate computing the TT decomposition with the Alternating Least Squares (ALS) algorithm relying on exact leverage scores sampling. For this purpose, we propose a data structure that allows us to efficiently sample from the tensor with time complexity logarithmic in the tensor size. Our contribution specifically leverages the canonical form of the TT decomposition. By maintaining the canonical form through each iteration of ALS, we can efficiently compute (and sample from) the leverage scores, thus achieving significant speed-up in solving each sketched least-square problem. Experiments on synthetic and real data on dense and sparse tensors demonstrate that our method outperforms SVD-based and ALS-based algorithms.

Efficient Leverage Score Sampling for Tensor Train Decomposition

TL;DR

and per-sample time

in the uniform case). The authors provide a rigorous proof outline for the sampling procedure and demonstrate empirically that rTT-ALS achieves up to about 16× speedups over non-randomized TT-ALS (and competitive accuracy) on both dense and sparse tensors, including massive real-world datasets. This approach enables scalable TT decompositions for large-scale ML and physics applications and suggests extensions to other tensor-network architectures that exploit canonical forms for efficient sampling.

Abstract

Paper Structure (23 sections, 7 theorems, 20 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 23 sections, 7 theorems, 20 equations, 7 figures, 2 tables, 2 algorithms.

Introduction
Related work
Preliminaries
Tensor Train Decomposition
Alternating Least Squares with Tensor Train Structure.
Sketching and Leverage Score Sampling
Sampling-based Tensor Train Decomposition
Efficient Core Chain Leverage Score Sampling
Experiments
Decomposition of Synthetic and Real Dense Datasets
Approximate Sparse Tensor Train Decomposition
Conclusion
Additional Notations
Details about Orthogonalization of the TT Decomposition
Proofs
...and 8 more sections

Key Result

Theorem 1.1

Let $\mathcal{A}_1, ..., \mathcal{A}_j$ be a sequence of 3D tensors, $\mathcal{A}_k \in \mathbb{R}^{R_{k-1} \times I_k \times R_k}$ (with $R_0=1$). Assume that the left-matricization of each core is orthogonal. Let $A_{\leq j}$ be the $\prod_{k=1}^j I_k \times R_k$ matrix obtained by unfolding the c

Figures (7)

Figure 1: Tensor Train decomposition of a 5-dimensional tensor in tensor network notation.
Figure 2: Orthonormal TT decomposition. The cores at the left side of $\mathcal{A}_3$ are left-orthonormal and the cores at the right are right-orthonormal.
Figure 3: Fit (left) and running time (right) averaged over 5 trials for the synthetic data experiment.
Figure 4: Fit as a function of time for three FROSTT tensors, $R=6$, $J=2^{16}$ for rTT-ALS. Thick lines are averages of 5 fit-time traces, shown by thin dotted lines.
Figure 5: Final fit of sparse tensor decomposition for varying sample counts. Each boxplot reports statistics for 5 trials. The blue dashed lines show the fit for non-randomized ALS.
...and 2 more figures

Theorems & Definitions (17)

Theorem 1.1: Row-norm-squared sampling for 3D core chains
Definition 3.1
Definition 3.2
Definition 3.3
Definition 3.4
Theorem 3.5
Lemma 4.1: Conditional distribution for $\hat{s}_k$
Lemma 4.2
Lemma 4.3: bharadwaj2023fast, Adapted
Corollary 4.4
...and 7 more

Efficient Leverage Score Sampling for Tensor Train Decomposition

TL;DR

Abstract

Efficient Leverage Score Sampling for Tensor Train Decomposition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (17)