Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay
Yuetian Luo, Anru R. Zhang
TL;DR
This work develops a unified Riemannian optimization framework for tensor-on-tensor regression with unknown Tucker rank, introducing Riemannian Gradient Descent (RGD) and Riemannian Gauss-Newton (RGN) to recover a low Tucker-rank parameter under rank over-parameterization. The authors prove linear convergence for RGD and quadratic convergence for RGN to a statistically optimal estimate, even when the rank is over-specified, and reveal an adaptive behavior where the algorithms need no rank-tuning changes. They establish a sharp statistical-computational gap using low-degree polynomials, showing that for order-3 or higher tensors, moderate rank over-parameterization can be essentially cost-free in sample complexity for computationally feasible estimators, unlike the matrix case. The paper also provides practical spectral initializations, specialized results for scalar-on-tensor and tensor-on-vector problems, and extensive numerical experiments that corroborate the theory and demonstrate advantages over existing methods.
Abstract
We study the tensor-on-tensor regression, where the goal is to connect tensor responses to tensor covariates with a low Tucker rank parameter tensor/matrix without the prior knowledge of its intrinsic rank. We propose the Riemannian gradient descent (RGD) and Riemannian Gauss-Newton (RGN) methods and cope with the challenge of unknown rank by studying the effect of rank over-parameterization. We provide the first convergence guarantee for the general tensor-on-tensor regression by showing that RGD and RGN respectively converge linearly and quadratically to a statistically optimal estimate in both rank correctly-parameterized and over-parameterized settings. Our theory reveals an intriguing phenomenon: Riemannian optimization methods naturally adapt to over-parameterization without modifications to their implementation. We also prove the statistical-computational gap in scalar-on-tensor regression by a direct low-degree polynomial argument. Our theory demonstrates a "blessing of statistical-computational gap" phenomenon: in a wide range of scenarios in tensor-on-tensor regression for tensors of order three or higher, the computationally required sample size matches what is needed by moderate rank over-parameterization when considering computationally feasible estimators, while there are no such benefits in the matrix settings. This shows moderate rank over-parameterization is essentially "cost-free" in terms of sample size in tensor-on-tensor regression of order three or higher. Finally, we conduct simulation studies to show the advantages of our proposed methods and to corroborate our theoretical findings.
