Matrix denoising: Bayes-optimal estimators via low-degree polynomials

Guilhem Semerjian

Matrix denoising: Bayes-optimal estimators via low-degree polynomials

Guilhem Semerjian

Abstract

We consider the additive version of the matrix denoising problem, where a random symmetric matrix $S$ of size $n$ has to be inferred from the observation of $Y=S+Z$, with $Z$ an independent random matrix modeling a noise. For prior distributions of $S$ and $Z$ that are invariant under conjugation by orthogonal matrices we determine, using results from first and second order free probability theory, the Bayes-optimal (in terms of the mean square error) polynomial estimators of degree at most $D$, asymptotically in $n$, and show that as $D$ increases they converge towards the estimator introduced by Bun, Allez, Bouchaud and Potters in [IEEE Transactions on Information Theory 62, 7475 (2016)]. We conjecture that this optimality holds beyond strictly orthogonally invariant priors, and provide partial evidences of this universality phenomenon when $S$ is an arbitrary Wishart matrix and $Z$ is drawn from the Gaussian Orthogonal Ensemble, a case motivated by the related extensive rank matrix factorization problem.

Matrix denoising: Bayes-optimal estimators via low-degree polynomials

Abstract

We consider the additive version of the matrix denoising problem, where a random symmetric matrix

of size

has to be inferred from the observation of

, with

an independent random matrix modeling a noise. For prior distributions of

and

that are invariant under conjugation by orthogonal matrices we determine, using results from first and second order free probability theory, the Bayes-optimal (in terms of the mean square error) polynomial estimators of degree at most

, asymptotically in

, and show that as

increases they converge towards the estimator introduced by Bun, Allez, Bouchaud and Potters in [IEEE Transactions on Information Theory 62, 7475 (2016)]. We conjecture that this optimality holds beyond strictly orthogonally invariant priors, and provide partial evidences of this universality phenomenon when

is an arbitrary Wishart matrix and

is drawn from the Gaussian Orthogonal Ensemble, a case motivated by the related extensive rank matrix factorization problem.

Paper Structure (40 sections, 149 equations, 10 figures)

This paper contains 40 sections, 149 equations, 10 figures.

Introduction
Main results
Assumptions
The BABP denoiser
Optimality of the BABP denoiser for orthogonally invariant priors
Universality conjecture
Optimal and approximate Bayesian estimation
Bayes-optimal estimation
Approximations
Symmetries
Matrix denoising, orthogonally invariant case
Equivariant estimators
Scalar polynomial estimators
Equations at finite $n$
The large $n$ limit
...and 25 more sections

Figures (10)

Figure 1: Illustration of the non-crossing partition of Eq (\ref{['eq_NC']}); for clarity the block containing 1 has been drawn above the horizontal axis, the other blocks below. Here $p+1=13$, the block containing the first element has cardinality $m=4$ and reads $\{1,4,5,9\}$, corresponding to $j_1=2$, $j_2=0$, $j_3=3$, the length of the successive intervals it does not cover. Because of the non-crossing condition the other blocks of the partitions decompose into non-crossing partitions of the intervals not covered, of lenght $j_1=2$, $j_2=0$, $j_3=3$, $p+1-m-j_1-j_2-j_3=4$.
Figure 2: The curves of ${\rm MMSE}^{(D)}$ as a function of $\Delta$, for $\alpha=1$ (left panel) and $\alpha=5$ (right panel), different colors corresponding to different values of $D$; the black curve labeled BABP is ${\rm MSE}_{\rm BABP}$, the large $D$ limit of ${\rm MMSE}^{(D)}$. The insets present the ratios ${\rm MMSE}^{(D)}/{\rm MSE}_{\rm BABP}$, with the same color code than in the main plots.
Figure 3: The density $\rho_Y$ of the observation matrix (left panel), and the optimal denoising function $\mathcal{D}_{\rm BABP}$ along with its low degree approximations $\mathcal{D}^{(D)}$ (right panel), for $\alpha=1$ and $\Delta=0.2$.
Figure 4: The density $\rho_Y$ of the observation matrix (left panel), and the optimal denoising function $\mathcal{D}_{\rm BABP}$ along with its low degree approximations $\mathcal{D}^{(D)}$ (right panel), for $\alpha=5$ and $\Delta=0.2$.
Figure 5: The graphs in $\mathcal{A}_{\rm od}^{(2)}$ that contribute to the off-diagonal estimators at order 2, in presence of the inversion symmetry. White (resp. black) circles represent the marked (resp. unmarked) vertices. The corresponding formulas for these four estimators can be found in equation (\ref{['eq_babcd']}).
...and 5 more figures

Matrix denoising: Bayes-optimal estimators via low-degree polynomials

Abstract

Matrix denoising: Bayes-optimal estimators via low-degree polynomials

Authors

Abstract

Table of Contents

Figures (10)