Table of Contents
Fetching ...

Approximation and learning with compositional tensor trains

Martin Eigel, Charles Miranda, Anthony Nouy, David Sommer

TL;DR

The paper introduces compositional tensor trains (CTTs) to unify expressivity and efficiency for high-dimensional function approximation by composing low-rank TT layers. It formalizes the CTT framework, demonstrates universal approximation capabilities under mild bases, and provides compression guarantees for layer-wise representations. Two optimization strategies are developed: a Pontryagin-maximum-principle-based approach (via the method of successive approximation) and a layerwise natural-gradient method, both leveraging tensor structure and stable regularization. Numerical experiments validate the approach on regression-style tasks, illustrate the benefits of low-rank implementations, and explore the role of random sketching in managing ill-conditioned Gram matrices. Overall, the work proposes a scalable, expressive alternative to standard deep neural networks by marrying compositional models with tensor-algebraic computation.

Abstract

We introduce compositional tensor trains (CTTs) for the approximation of multivariate functions, a class of models obtained by composing low-rank functions in the tensor-train format. This format can encode standard approximation tools, such as (sparse) polynomials, deep neural networks (DNNs) with fixed width, or tensor networks with arbitrary permutation of the inputs, or more general affine coordinate transformations, with similar complexities. This format can be viewed as a DNN with width exponential in the input dimension and structured weights matrices. Compared to DNNs, this format enables controlled compression at the layer level using efficient tensor algebra. On the optimization side, we derive a layerwise algorithm inspired by natural gradient descent, allowing to exploit efficient low-rank tensor algebra. This relies on low-rank estimations of Gram matrices, and tensor structured random sketching. Viewing the format as a discrete dynamical system, we also derive an optimization algorithm inspired by numerical methods in optimal control. Numerical experiments on regression tasks demonstrate the expressivity of the new format and the relevance of the proposed optimization algorithms. Overall, CTTs combine the expressivity of compositional models with the algorithmic efficiency of tensor algebra, offering a scalable alternative to standard deep neural networks.

Approximation and learning with compositional tensor trains

TL;DR

The paper introduces compositional tensor trains (CTTs) to unify expressivity and efficiency for high-dimensional function approximation by composing low-rank TT layers. It formalizes the CTT framework, demonstrates universal approximation capabilities under mild bases, and provides compression guarantees for layer-wise representations. Two optimization strategies are developed: a Pontryagin-maximum-principle-based approach (via the method of successive approximation) and a layerwise natural-gradient method, both leveraging tensor structure and stable regularization. Numerical experiments validate the approach on regression-style tasks, illustrate the benefits of low-rank implementations, and explore the role of random sketching in managing ill-conditioned Gram matrices. Overall, the work proposes a scalable, expressive alternative to standard deep neural networks by marrying compositional models with tensor-algebraic computation.

Abstract

We introduce compositional tensor trains (CTTs) for the approximation of multivariate functions, a class of models obtained by composing low-rank functions in the tensor-train format. This format can encode standard approximation tools, such as (sparse) polynomials, deep neural networks (DNNs) with fixed width, or tensor networks with arbitrary permutation of the inputs, or more general affine coordinate transformations, with similar complexities. This format can be viewed as a DNN with width exponential in the input dimension and structured weights matrices. Compared to DNNs, this format enables controlled compression at the layer level using efficient tensor algebra. On the optimization side, we derive a layerwise algorithm inspired by natural gradient descent, allowing to exploit efficient low-rank tensor algebra. This relies on low-rank estimations of Gram matrices, and tensor structured random sketching. Viewing the format as a discrete dynamical system, we also derive an optimization algorithm inspired by numerical methods in optimal control. Numerical experiments on regression tasks demonstrate the expressivity of the new format and the relevance of the proposed optimization algorithms. Overall, CTTs combine the expressivity of compositional models with the algorithmic efficiency of tensor algebra, offering a scalable alternative to standard deep neural networks.

Paper Structure

This paper contains 47 sections, 15 theorems, 110 equations, 6 figures, 1 table, 3 algorithms.

Key Result

Lemma 2.1

Assume $v\colon \mathbb{R}^d \to \mathbb{R}^d$, $d\in\mathbb{N}$, has the form $v(x) = (v_1(x),0,\ldots,0)^{\top}$ for a $v_1\colon \mathbb{R}^d\to\mathbb{R}$ with TT rank $\bm r = (r_1,\ldots,r_{d-1})$ and $r_0=r_d=1$. Then, $v$ can be represented by a TT with rank $\bm r$ and $r_0=d$, $r_d=1$.

Figures (6)

  • Figure 1: Tensor-train format with cores $\bm{U}_1,\dots,\bm{U}_d$
  • Figure 2: Visualization of the gradient fields $\nabla_\theta L(\theta)$ and $\widetilde{\nabla}_\theta L(\theta)$ for the model $u_\theta(x,y)=\exp(\theta_1 x + \cos(y-\theta_2))$ and the loss function $\mathcal{L}(u) = \frac{1}{2} \|u-u^*\|_{L^2([0,1]^2)}^2$.
  • Figure 3: Convergence plot for the optimizers Adam, NGD and L-BFGS for the recovery problem \ref{['eq:recovery-2']} in log-log scale, for dimensions $d=4,5$.
  • Figure 4: Relative $L^2$ error versus time for the optimizers Adam, NGD and L-BFGS for the recovery problem \ref{['eq:recovery-2']} in log-log scale, for dimensions $d=4,5$.
  • Figure 5: Condition number and rank of $G_\ell$, with $\ell=1,2$, for the recovery problem \ref{['eq:recovery-2']} for dimension $d=4$. The plain line is the median, and the envelope corresponds to the interquartile.
  • ...and 1 more figures

Theorems & Definitions (36)

  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • Definition 3.1: Compositional tensors
  • Proposition 3.2: Linear affine maps written in the tensor-train format
  • proof
  • Corollary 3.3
  • Proposition 3.4
  • proof
  • ...and 26 more