Table of Contents
Fetching ...

A Practical Mode-parallel Implementation of the (H-)Tucker Decomposition via Randomization

Martina Iannacito, Sascha Portaro, Davide Palitta, Claudio Arlandini, Domitilla Brandoni

Abstract

In the last decades, tensors have emerged as the right tool to represent multidimensional data in a compact yet informative manner. Moreover, it is well-known that by performing low-rank factorizations of such tensors one is often able to effectively unveil possible hidden structure in data, mainly due to unexpected dependencies among the different variables encoded in the given tensor. However, computing these factorizations is extremely energy-consuming and memory-demanding, especially for high-dimensional tensors, namely those with a large number of modes. In this paper we focus on two state-of-the-art tensor decompositions: the Tucker and H-Tucker decompositions. We propose novel numerical strategies able to perform these factorizations in a \emph{mode-parallel} fashion, that is the operations required by the algorithm along all modes are performed in parallel. This is in contrast to what is achieved by many procedures available in the literature that parallelize some of the operations along each mode, e.g., tensor-times-matrix steps, while still visiting one mode at the time in a sequential manner. Our strategies make use of cutting-edge randomization techniques comprising fiber sampling and randomized range-finding steps. We provide upper bounds on the expected value of the error provided by our factorizations while a panel of numerical results showcases the potential of our approach in reducing both the running time and the storage demand of the whole procedure. Moreover, experiments carried out in HPC environments illustrate the good scaling of our mode-parallel approach.

A Practical Mode-parallel Implementation of the (H-)Tucker Decomposition via Randomization

Abstract

In the last decades, tensors have emerged as the right tool to represent multidimensional data in a compact yet informative manner. Moreover, it is well-known that by performing low-rank factorizations of such tensors one is often able to effectively unveil possible hidden structure in data, mainly due to unexpected dependencies among the different variables encoded in the given tensor. However, computing these factorizations is extremely energy-consuming and memory-demanding, especially for high-dimensional tensors, namely those with a large number of modes. In this paper we focus on two state-of-the-art tensor decompositions: the Tucker and H-Tucker decompositions. We propose novel numerical strategies able to perform these factorizations in a \emph{mode-parallel} fashion, that is the operations required by the algorithm along all modes are performed in parallel. This is in contrast to what is achieved by many procedures available in the literature that parallelize some of the operations along each mode, e.g., tensor-times-matrix steps, while still visiting one mode at the time in a sequential manner. Our strategies make use of cutting-edge randomization techniques comprising fiber sampling and randomized range-finding steps. We provide upper bounds on the expected value of the error provided by our factorizations while a panel of numerical results showcases the potential of our approach in reducing both the running time and the storage demand of the whole procedure. Moreover, experiments carried out in HPC environments illustrate the good scaling of our mode-parallel approach.
Paper Structure (19 sections, 5 theorems, 44 equations, 7 figures, 4 algorithms)

This paper contains 19 sections, 5 theorems, 44 equations, 7 figures, 4 algorithms.

Key Result

Lemma 3.1

\newlabellemma_fullrank0 Let ${\mathbf{X}}={\mathbf{U\Sigma V}}^T$ with ${\mathbf{V}}=[{\mathbf{V}}_1,{\mathbf{V}}_2]$, ${\mathbf{V}}_1\in \mathbb{R}^{m \times r}$, ${\mathbf{V}}_2\in\mathbb{R}^{m \times (n-r)}$ as in partition. Define the quantities $M_i := m\cdot\mu({\mathbf{V}}_i)$, $i=1,2$. Se with failure probability at most $r \cdot \left ( \frac{\mathtt e^{-\delta}}{(1-\delta)^{1-\delta}}

Figures (7)

  • Figure 1: Parallel performance of Sub-R-HOSVD. The dashed line with slope 1 denotes the ideal linear speed-up.
  • Figure 1: Dimension tree for an order-$8$ tensor with transfer tensors enumerated by heap indexing.
  • Figure 2: Parallel performance of Sub-R-HOSVD with a parallel computation of the indeces for the fiber sampling. The dashed line with slope 1 denotes the ideal linear speed-up.
  • Figure 2: Comparison of the considered methods on synthetic tensors with i.i.d. entries drawn from a standard normal distribution. All the randomized algorithms are tested over $25$ independent runs.
  • Figure 3: Parallel performance of Sub-R-LtR-HT using $1$, $2$, $4$ and $8$ processes. The dashed line with slope 1 denotes the ideal linear speed-up.
  • ...and 2 more figures

Theorems & Definitions (13)

  • Definition 2.1
  • Definition 2.2
  • Lemma 3.1
  • Proof 1
  • Lemma 3.2
  • Proof 2
  • Theorem 3.3
  • Proof 3
  • Theorem 3.4
  • Proof 4
  • ...and 3 more