Table of Contents
Fetching ...

Near optimal sample complexity for matrix and tensor normal models via geodesic convexity

Cole Franks, Rafael Oliveira, Akshay Ramachandran, Michael Walter

TL;DR

We address estimating Kronecker-structured covariance factors in matrix and tensor normal Gaussian models, formulating the MLE on a geodesically convex manifold endowed with the Fisher information metric. The key technical advance is proving strong geodesic convexity of the negative log-likelihood via quantum-expander analysis, which yields near-optimal nonasymptotic sample complexity and error rates in Fisher-Rao and Thompson distances, valid without conditioning or initialization assumptions. The flip-flop algorithm is analyzed as a geodesic-descent method that converges linearly to the MLE with high probability, providing a practically efficient route to optimal estimation. Lower bounds extend classical Gaussian results to the Kronecker-structured setting, showing near-minimax optimality, while the tensor/matrix normal results are nearly tight up to logarithmic factors. Together, these results establish a coherent, condition-number-free theory for high-dimensional Kronecker-structured covariance estimation with scalable computation.

Abstract

The matrix normal model, i.e., the family of Gaussian matrix-variate distributions whose covariance matrices are the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and tensor normal models. For the above models, we show that the maximum likelihood estimator (MLE) achieves nearly optimal nonasymptotic sample complexity and nearly tight error rates in the Fisher-Rao and Thompson metrics. In contrast to prior work, our results do not rely on the factors being well-conditioned or sparse, nor do we need to assume an accurate enough initial guess. For the matrix normal model, all our bounds are minimax optimal up to logarithmic factors, and for the tensor normal model our bounds for the largest factor and for overall covariance matrix are minimax optimal up to constant factors provided there are enough samples for any estimator to obtain constant Frobenius error. In the same regimes as our sample complexity bounds, we show that the flip-flop algorithm, a practical and widely used iterative procedure to compute the MLE, converges linearly with high probability. Our main technical insight is that, given enough samples, the negative log-likelihood function is strongly geodesically convex in the geometry on positive-definite matrices induced by the Fisher information metric. This strong convexity is determined by the expansion of certain random quantum channels.

Near optimal sample complexity for matrix and tensor normal models via geodesic convexity

TL;DR

We address estimating Kronecker-structured covariance factors in matrix and tensor normal Gaussian models, formulating the MLE on a geodesically convex manifold endowed with the Fisher information metric. The key technical advance is proving strong geodesic convexity of the negative log-likelihood via quantum-expander analysis, which yields near-optimal nonasymptotic sample complexity and error rates in Fisher-Rao and Thompson distances, valid without conditioning or initialization assumptions. The flip-flop algorithm is analyzed as a geodesic-descent method that converges linearly to the MLE with high probability, providing a practically efficient route to optimal estimation. Lower bounds extend classical Gaussian results to the Kronecker-structured setting, showing near-minimax optimality, while the tensor/matrix normal results are nearly tight up to logarithmic factors. Together, these results establish a coherent, condition-number-free theory for high-dimensional Kronecker-structured covariance estimation with scalable computation.

Abstract

The matrix normal model, i.e., the family of Gaussian matrix-variate distributions whose covariance matrices are the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and tensor normal models. For the above models, we show that the maximum likelihood estimator (MLE) achieves nearly optimal nonasymptotic sample complexity and nearly tight error rates in the Fisher-Rao and Thompson metrics. In contrast to prior work, our results do not rely on the factors being well-conditioned or sparse, nor do we need to assume an accurate enough initial guess. For the matrix normal model, all our bounds are minimax optimal up to logarithmic factors, and for the tensor normal model our bounds for the largest factor and for overall covariance matrix are minimax optimal up to constant factors provided there are enough samples for any estimator to obtain constant Frobenius error. In the same regimes as our sample complexity bounds, we show that the flip-flop algorithm, a practical and widely used iterative procedure to compute the MLE, converges linearly with high probability. Our main technical insight is that, given enough samples, the negative log-likelihood function is strongly geodesically convex in the geometry on positive-definite matrices induced by the Fisher information metric. This strong convexity is determined by the expansion of certain random quantum channels.

Paper Structure

This paper contains 32 sections, 62 theorems, 262 equations, 7 tables.

Key Result

Theorem 1

Let $\mathcal{N}(0, \Theta_1^{-1} \otimes \cdots \otimes \Theta_k^{-1})$ be a tensor normal distribution with $k \geq 2$, where each $\Theta_i$ is a positive definite matrix of dimension $d_i$, and let $D := \prod_{i=1}^k d_i$. Given a number of samples $n$ respecting the sample threshold $n \gtrsim with high probability. Further, for the matrix normal model (i.e., $k=2$), the sample threshold is

Theorems & Definitions (127)

  • Definition 1.1: Fisher-Rao and Thompson distances
  • Remark 1.5
  • Theorem : Sample complexity, tensor normal model
  • Theorem 1.7: Lower bound for matrix normal models
  • Theorem : Computational estimation, informal
  • Definition 1.8
  • Definition 1.9: Precision matrices
  • Theorem 1.10: Tensor normal model sample complexity upper bounds
  • Theorem 1.11: Matrix normal model sample complexity upper bounds
  • Corollary 1.12: Estimating only $\Theta_1$
  • ...and 117 more