Table of Contents
Fetching ...

Online Tensor Learning: Computational and Statistical Trade-offs, Adaptivity and Optimal Regret

Jingyang Li, Jian-Feng Cai, Yang Chen, Dong Xia

TL;DR

A unified online Riemannian gradient descent (oRGrad) algorithm for tensor learning, which is computationally efficient, consumes much less memory, and can handle sequentially arriving data while making timely predictions.

Abstract

Large tensor learning algorithms are typically computationally expensive and require storing a vast amount of data. In this paper, we propose a unified online Riemannian gradient descent (oRGrad) algorithm for tensor learning, which is computationally efficient, consumes much less memory, and can handle sequentially arriving data while making timely predictions. The algorithm is applicable to both linear and generalized linear models. If the time horizon T is known, oRGrad achieves statistical optimality by choosing an appropriate fixed step size. We find that noisy tensor completion particularly benefits from online algorithms by avoiding the trimming procedure and ensuring sharp entry-wise statistical error, which is often technically challenging for offline methods. The regret of oRGrad is analyzed, revealing a fascinating trilemma concerning the computational convergence rate, statistical error, and regret bound. By selecting an appropriate constant step size, oRGrad achieves an $O(T^{1/2})$ regret. We then introduce the adaptive-oRGrad algorithm, which can achieve the optimal $O(\log T)$ regret by adaptively selecting step sizes, regardless of whether the time horizon is known. The adaptive-oRGrad algorithm can attain a statistically optimal error rate without knowing the horizon. Comprehensive numerical simulations corroborate our theoretical findings. We show that oRGrad significantly outperforms its offline counterpart in predicting the solar F10.7 index with tensor predictors that monitor space weather impacts.

Online Tensor Learning: Computational and Statistical Trade-offs, Adaptivity and Optimal Regret

TL;DR

A unified online Riemannian gradient descent (oRGrad) algorithm for tensor learning, which is computationally efficient, consumes much less memory, and can handle sequentially arriving data while making timely predictions.

Abstract

Large tensor learning algorithms are typically computationally expensive and require storing a vast amount of data. In this paper, we propose a unified online Riemannian gradient descent (oRGrad) algorithm for tensor learning, which is computationally efficient, consumes much less memory, and can handle sequentially arriving data while making timely predictions. The algorithm is applicable to both linear and generalized linear models. If the time horizon T is known, oRGrad achieves statistical optimality by choosing an appropriate fixed step size. We find that noisy tensor completion particularly benefits from online algorithms by avoiding the trimming procedure and ensuring sharp entry-wise statistical error, which is often technically challenging for offline methods. The regret of oRGrad is analyzed, revealing a fascinating trilemma concerning the computational convergence rate, statistical error, and regret bound. By selecting an appropriate constant step size, oRGrad achieves an regret. We then introduce the adaptive-oRGrad algorithm, which can achieve the optimal regret by adaptively selecting step sizes, regardless of whether the time horizon is known. The adaptive-oRGrad algorithm can attain a statistically optimal error rate without knowing the horizon. Comprehensive numerical simulations corroborate our theoretical findings. We show that oRGrad significantly outperforms its offline counterpart in predicting the solar F10.7 index with tensor predictors that monitor space weather impacts.
Paper Structure (40 sections, 21 theorems, 433 equations, 6 figures, 2 tables, 6 algorithms)

This paper contains 40 sections, 21 theorems, 433 equations, 6 figures, 2 tables, 6 algorithms.

Key Result

Theorem 1

Suppose Assumptions assump:X-design-assump:GLM-trueT hold, the initialization ${\boldsymbol{\mathcal{T}}}_0\in{\mathbb M}_{{\boldsymbol r}}$ satisfies $\| {\boldsymbol{\mathcal{T}}}_0 - {\boldsymbol{\mathcal{T}}}^* \|_{\rm{F}}\leq c_m\mu_{\alpha}^{-1}\gamma_{\alpha}\lambda_{{\textsf{\tiny min}}}$ fo and the signal strength satisfies where $C_0,\ldots,C_4>0$ are absolute constants, and $c_m, C_m>

Figures (6)

  • Figure 1: Scree plot for rank selection
  • Figure 2: Convergence dynamics of oRGrad for online tensor linear regression and completion.
  • Figure 3: Average per-step runtime versus cubic of dimension
  • Figure 4: Regret performance. Left: a constant step size; Right: adaptive step sizes.
  • Figure 5: Example of one slice in the tensor covariate and the response
  • ...and 1 more figures

Theorems & Definitions (38)

  • Example 1: linear regression
  • Example 2: logistic regression
  • Example 3: Poisson regression
  • Example 4: noisy tensor completion
  • Example 5: binary tensor learning
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • ...and 28 more