Fundamental Limits of Matrix Sensing: Exact Asymptotics, Universality, and Applications
Yizhou Xu, Antoine Maillard, Lenka Zdeborová, Florent Krzakala
TL;DR
The paper provides a rigorous, high-dimensional characterization of Bayes-optimal learning in matrix sensing with high-rank, rotationally invariant signals. It derives a replica-symmetric free entropy framework that yields sharp limits for mutual information and MMSE, and extends these results to rectangular models and diverse data priors through Gaussian universality and adaptive interpolation. The work demonstrates concrete applications to neural-network-like models and bilinear sequence regression, validating physics-based predictions with formal proofs and connecting matrix denoising to broader learning problems. The methodologies offer a pathway to rigorous understanding of learnability in complex structured models and reveal fundamental limits for learning from token sequences and deep networks with quadratic activations.
Abstract
In the matrix sensing problem, one wishes to reconstruct a matrix from (possibly noisy) observations of its linear projections along given directions. We consider this model in the high-dimensional limit: while previous works on this model primarily focused on the recovery of low-rank matrices, we consider in this work more general classes of structured signal matrices with potentially large rank, e.g. a product of two matrices of sizes proportional to the dimension. We provide rigorous asymptotic equations characterizing the Bayes-optimal learning performance from a number of samples which is proportional to the number of entries in the matrix. Our proof is composed of three key ingredients: $(i)$ we prove universality properties to handle structured sensing matrices, related to the ''Gaussian equivalence'' phenomenon in statistical learning, $(ii)$ we provide a sharp characterization of Bayes-optimal learning in generalized linear models with Gaussian data and structured matrix priors, generalizing previously studied settings, and $(iii)$ we leverage previous works on the problem of matrix denoising. The generality of our results allow for a variety of applications: notably, we mathematically establish predictions obtained via non-rigorous methods from statistical physics in [ETB+24] regarding Bilinear Sequence Regression, a benchmark model for learning from sequences of tokens, and in [MTM+24] on Bayes-optimal learning in neural networks with quadratic activation function, and width proportional to the dimension.
