Table of Contents
Fetching ...

Optimal Matrix-Mimetic Tensor Algebras via Variable Projection

Elizabeth Newman, Katherine Keegan

TL;DR

This work develops a matrix-mimetic tensor framework in which a learnable invertible transform $ extbf{M}$ governs the underlying tensor algebra via the $ extbackslash star_{ extbf{M}}$-product. By applying variable projection, they couple the transform with the desired tensor representation, enabling simultaneous learning of $ extbf{M}$ and the tensor factors, while preserving matrix-like optimality properties such as the Eckart–Young-like theorem for the $t$-SVDM. They establish invariance and uniqueness results, derive derivatives through the $t$-SVDM, and provide convergence guarantees for a Riemannian-gradient-based optimization on the orthogonal group. Through extensive experiments in financial index tracking, image compression, and reduced-order modeling, the learned transform $ extbf{M}^*$ consistently yields higher-quality, more compressible representations than heuristic transforms. The framework affords broad applicability, robust performance under noise, and a principled pathway to learn data-adaptive tensor algebras with practical computational trade-offs and publicly available code.

Abstract

Recent advances in {matrix-mimetic} tensor frameworks have made it possible to preserve linear algebraic properties for multilinear data analysis and, as a result, to obtain optimal representations of multiway data. Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices. Underlying the tensor operation is an algebraic framework parameterized by an invertible linear transformation. The choice of linear mapping is crucial to representation quality and, in practice, is made heuristically based on expected correlations in the data. However, in many cases, these correlations are unknown and common heuristics lead to suboptimal performance. In this work, we simultaneously learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data. Our new framework explicitly captures the coupling between the transformation and representation using variable projection. We preserve the invertibility of the linear mapping by learning orthogonal transformations with Riemannian optimization. We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm. We demonstrate the generality of our framework through numerical experiments on a wide range of applications, including financial index tracking, image compression, and reduced order modeling. We have published all the code related to this work at https://github.com/elizabethnewman/star-M-opt.

Optimal Matrix-Mimetic Tensor Algebras via Variable Projection

TL;DR

This work develops a matrix-mimetic tensor framework in which a learnable invertible transform governs the underlying tensor algebra via the -product. By applying variable projection, they couple the transform with the desired tensor representation, enabling simultaneous learning of and the tensor factors, while preserving matrix-like optimality properties such as the Eckart–Young-like theorem for the -SVDM. They establish invariance and uniqueness results, derive derivatives through the -SVDM, and provide convergence guarantees for a Riemannian-gradient-based optimization on the orthogonal group. Through extensive experiments in financial index tracking, image compression, and reduced-order modeling, the learned transform consistently yields higher-quality, more compressible representations than heuristic transforms. The framework affords broad applicability, robust performance under noise, and a principled pathway to learn data-adaptive tensor algebras with practical computational trade-offs and publicly available code.

Abstract

Recent advances in {matrix-mimetic} tensor frameworks have made it possible to preserve linear algebraic properties for multilinear data analysis and, as a result, to obtain optimal representations of multiway data. Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices. Underlying the tensor operation is an algebraic framework parameterized by an invertible linear transformation. The choice of linear mapping is crucial to representation quality and, in practice, is made heuristically based on expected correlations in the data. However, in many cases, these correlations are unknown and common heuristics lead to suboptimal performance. In this work, we simultaneously learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data. Our new framework explicitly captures the coupling between the transformation and representation using variable projection. We preserve the invertibility of the linear mapping by learning orthogonal transformations with Riemannian optimization. We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm. We demonstrate the generality of our framework through numerical experiments on a wide range of applications, including financial index tracking, image compression, and reduced order modeling. We have published all the code related to this work at https://github.com/elizabethnewman/star-M-opt.
Paper Structure (60 sections, 117 equations, 15 figures, 4 tables)

This paper contains 60 sections, 117 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Visualization of a third-order tensor $\boldsymbol{\mathcal{A}}\in \mathbb{R}^{n_1\times n_2\times n_3}$ as a matrix of tubes and various partitions and unfoldings.
  • Figure 1: Convergence of $\star_{\mathbf{M}}$-optimization for \ref{['eq:angleObjFctn']}. (Left): Visualization of $\overline{\phi}$ with initial angles indicated (diamonds). There are four minima (colorful circles) and four maxima (gray squares). (Middle): Convergence of $\overline{\phi}$ using a fixed step size of $\alpha=10^{-1}$. Here, $\alpha = \frac{1}{L^*}$ where $L^* = 10 \ge \max_{\theta} |\bar{\phi}"(\theta)|$. For all initial angles (save the maxima), $\star_{\mathbf{M}}$-optimization converges to the closest minimum, satisfying \ref{['cor:convergence']}. The periodicity of $\overline{\phi}$ leads to the same convergence values for initializations equidistant from the optima (i.e., diamonds of the same color follow the same convergence behavior). (Right): Convergence of the norm of $\mathop{\mathrm{grad}}\nolimits \overline{\phi}$. The convergence rate is asymptotically linear with $\|\mathop{\mathrm{grad}}\nolimits(\theta_j)\|_F \approx 0.64 \|\mathop{\mathrm{grad}}\nolimits(\theta_{j-1})\|_F$. Thus, $\|\mathop{\mathrm{grad}}\nolimits(\theta_{j})\|_F \le \varepsilon$ in $\mathcal{O}(\log(1 / \varepsilon))$ iterations, which is within the guarantees of \ref{['cor:convergence']}.
  • Figure 1: Tensorized index tracking per sector for various choices of $\mathbf{M}$. The learned $\mathbf{M}^*$ tracks the sector indices best for the historic data and does comparatively well across sectors for future data.
  • Figure 1: Illustration of optimization on a manifold. We update our current iterate (white dot at top) along the manifold (gray) following the geodesic $\gamma(t)$ (blue). From an implementation perspective, we first compute the Euclidean gradient (magenta arrow $\mathbf{S}$) and decompose it into a tangent direction (cyan arrow $\mathbf{S}_T$ on the green tangent bundle $T_\mathcal{M}(\mathbf{M})$) and a normal direction ( black arrow $\mathbf{S}_N$). We step along the tangent direction and then retract onto the manifold (large, lower white dot).
  • Figure 1: Convergence of $\star_{\mathbf{M}}$-optimization for $t$-linear regression for various noise levels.
  • ...and 10 more figures

Theorems & Definitions (42)

  • Remark
  • Definition 2.1: mode-$3$ unfolding
  • Definition 2.2: mode-$3$ product
  • Remark
  • Definition 2.3: $\star_{\mathbf{M}}$-tubal multiplication
  • Definition 2.4: $\star_{\mathbf{M}}$-product
  • Definition 2.5: $\star_{\mathbf{M}}$-transpose
  • Definition 2.6: f-diagonal
  • Definition 2.7: $\star_{\mathbf{M}}$-identity tube
  • Definition 2.8: $\star_{\mathbf{M}}$-identity
  • ...and 32 more