Table of Contents
Fetching ...

Stochastic Gradient Descent for Incomplete Tensor Linear Systems

Anna Ma, Deanna Needell, Alexander Xue

TL;DR

This work extends stochastic gradient methods to incomplete tensor linear systems under the t-product, introducing the mSGDT algorithm with a correction term to ensure unbiased gradient estimates across multiple missing-data models. It provides rigorous convergence guarantees for both diminishing and fixed step sizes and validates the approach on synthetic data and real-world shuttle video tensors, demonstrating robust recovery despite substantial data loss. The framework generalizes prior matrix results to tensors and highlights how different missing-data patterns influence the update direction and convergence behavior, opening avenues for applying SGD-based tensor methods to broader missing-data scenarios and tensor-tensor products.

Abstract

Solving large tensor linear systems poses significant challenges due to the high volume of data stored, and it only becomes more challenging when some of the data is missing. Recently, Ma et al. showed that this problem can be tackled using a stochastic gradient descent-based method, assuming that the missing data follows a uniform missing pattern. We adapt the technique by modifying the update direction, showing that the method is applicable under other missing data models. We prove convergence results and experimentally verify these results on synthetic data.

Stochastic Gradient Descent for Incomplete Tensor Linear Systems

TL;DR

This work extends stochastic gradient methods to incomplete tensor linear systems under the t-product, introducing the mSGDT algorithm with a correction term to ensure unbiased gradient estimates across multiple missing-data models. It provides rigorous convergence guarantees for both diminishing and fixed step sizes and validates the approach on synthetic data and real-world shuttle video tensors, demonstrating robust recovery despite substantial data loss. The framework generalizes prior matrix results to tensors and highlights how different missing-data patterns influence the update direction and convergence behavior, opening avenues for applying SGD-based tensor methods to broader missing-data scenarios and tensor-tensor products.

Abstract

Solving large tensor linear systems poses significant challenges due to the high volume of data stored, and it only becomes more challenging when some of the data is missing. Recently, Ma et al. showed that this problem can be tackled using a stochastic gradient descent-based method, assuming that the missing data follows a uniform missing pattern. We adapt the technique by modifying the update direction, showing that the method is applicable under other missing data models. We prove convergence results and experimentally verify these results on synthetic data.

Paper Structure

This paper contains 10 sections, 12 theorems, 59 equations, 6 figures, 1 algorithm.

Key Result

Lemma 1

(shamir2013stochastic Theorem 2) Suppose that $F$ is convex and ${\cal W}$ is a closed convex domain containing the solution ${\cal X}_\star$. Furthermore, suppose that for some constants $G$ and $K$, it holds that $\mathbb{E}[g({\cal X})] = \nabla F({\cal X})$ and $\mathbb{E}[\|g({\cal X})\|^2] \le

Figures (6)

  • Figure 1: Log-log plot for Algorithm \ref{['alg:mSGDT']} under the uniform missing data model, for varying values of $p \in \{.3, .5, .7, .99\}$. The $x$-axis is the iteration, and the $y$-axis is the error, defined as the Frobenius norm of ${\cal X}^t - {\cal X}_\ast$.
  • Figure 2: Log-log plot for Algorithm \ref{['alg:mSGDT']} under the column block missing data model, for block size $b = 4$ and for varying values of $p \in \{.3, .5, .7, .99\}$. The $x$-axis is the iteration, and the $y$-axis is the error, defined as the Frobenius norm of ${\cal X}^t - {\cal X}_\ast$.
  • Figure 3: Log-log plot for Algorithm \ref{['alg:mSGDT']} under the frontal slice missing data model, for varying values of $p \in \{.3, .5, .7, .99\}$. The $x$-axis is the iteration, and the $y$-axis is the error, defined as the Frobenius norm of ${\cal X}^t - {\cal X}_\ast$.
  • Figure 4: First and last frames of shuttle dataset.
  • Figure 5: First and last frames of reconstructed tensor. Entries in ${\cal A}$ are uniformly observed with probability $p = 0.3$.
  • ...and 1 more figures

Theorems & Definitions (24)

  • Definition 1: Tensor operations
  • Definition 2: t-product
  • Definition 3: Transpose, Hermitian
  • Definition 4: Norm
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Corollary 3
  • proof : Proof of Theorem \ref{['thm:changingstepsize']}
  • Lemma 2: needell2014stochastic Lemma A.1
  • ...and 14 more