Table of Contents
Fetching ...

On the phase diagram of extensive-rank symmetric matrix denoising beyond rotational invariance

Jean Barbier, Francesco Camilli, Justin Ko, Koki Okajima

TL;DR

The paper investigates Bayesian matrix denoising for an extensive-rank signal ${\mathbf X}{\mathbf X}^{\intercal}$ without rotational invariance, aiming to map its information-theoretic limits via a phase diagram. It develops a novel multiscale mean-field framework that blends cavity-method reductions with effective scalar inference to compute the MMSE and MI, and introduces a complete ansatz that unifies universal (matrix-model) and non-universal (factorisation) regimes. A key finding is a denoising-factorisation transition along a line $\lambda_c(\alpha)$, with universality holding below this line and breaking beyond it for discrete priors, implying algorithmic hardness in the non-universal phase. The framework connects to replica calculations and HCIZ-based matrix-model results in the denoising phase, while providing a rigorous mean-field description in the factorisation phase, offering insights into when full factorisation of ${\mathbf X}$ is information-theoretically possible and how it may be achieved or approximated in practice.

Abstract

Matrix denoising is central to signal processing and machine learning. Its statistical analysis when the matrix to infer has a factorised structure with a rank growing proportionally to its dimension remains a challenge, except when it is rotationally invariant. In this case the information theoretic limits and an efficient Bayes-optimal denoising algorithm, called rotational invariant estimator [1,2], are known. Beyond this setting few results can be found. The reason is that the model is not a usual spin system because of the growing rank dimension, nor a matrix model (as appearing in high-energy physics) due to the lack of rotation symmetry, but rather a hybrid between the two. Here we make progress towards the understanding of Bayesian matrix denoising when the signal is a factored matrix $XX^\intercal$ that is not rotationally invariant. Monte Carlo simulations suggest the existence of a \emph{denoising-factorisation transition} separating a phase where denoising using the rotational invariant estimator remains Bayes-optimal due to universality properties of the same nature as in random matrix theory, from one where universality breaks down and better denoising is possible, though algorithmically hard. We argue that it is only beyond the transition that factorisation, i.e., estimating $X$ itself, becomes possible up to irresolvable ambiguities. On the theory side, we combine mean-field techniques in an interpretable multiscale fashion in order to access the minimum mean-square error and mutual information. Interestingly, our alternative method yields equations reproducible by the replica approach of [3]. Using numerical insights, we delimit the portion of phase diagram where we conjecture the mean-field theory to be exact, and correct it using universality when it is not. Our complete ansatz matches well the numerics in the whole phase diagram when considering finite size effects.

On the phase diagram of extensive-rank symmetric matrix denoising beyond rotational invariance

TL;DR

The paper investigates Bayesian matrix denoising for an extensive-rank signal without rotational invariance, aiming to map its information-theoretic limits via a phase diagram. It develops a novel multiscale mean-field framework that blends cavity-method reductions with effective scalar inference to compute the MMSE and MI, and introduces a complete ansatz that unifies universal (matrix-model) and non-universal (factorisation) regimes. A key finding is a denoising-factorisation transition along a line , with universality holding below this line and breaking beyond it for discrete priors, implying algorithmic hardness in the non-universal phase. The framework connects to replica calculations and HCIZ-based matrix-model results in the denoising phase, while providing a rigorous mean-field description in the factorisation phase, offering insights into when full factorisation of is information-theoretically possible and how it may be achieved or approximated in practice.

Abstract

Matrix denoising is central to signal processing and machine learning. Its statistical analysis when the matrix to infer has a factorised structure with a rank growing proportionally to its dimension remains a challenge, except when it is rotationally invariant. In this case the information theoretic limits and an efficient Bayes-optimal denoising algorithm, called rotational invariant estimator [1,2], are known. Beyond this setting few results can be found. The reason is that the model is not a usual spin system because of the growing rank dimension, nor a matrix model (as appearing in high-energy physics) due to the lack of rotation symmetry, but rather a hybrid between the two. Here we make progress towards the understanding of Bayesian matrix denoising when the signal is a factored matrix that is not rotationally invariant. Monte Carlo simulations suggest the existence of a \emph{denoising-factorisation transition} separating a phase where denoising using the rotational invariant estimator remains Bayes-optimal due to universality properties of the same nature as in random matrix theory, from one where universality breaks down and better denoising is possible, though algorithmically hard. We argue that it is only beyond the transition that factorisation, i.e., estimating itself, becomes possible up to irresolvable ambiguities. On the theory side, we combine mean-field techniques in an interpretable multiscale fashion in order to access the minimum mean-square error and mutual information. Interestingly, our alternative method yields equations reproducible by the replica approach of [3]. Using numerical insights, we delimit the portion of phase diagram where we conjecture the mean-field theory to be exact, and correct it using universality when it is not. Our complete ansatz matches well the numerics in the whole phase diagram when considering finite size effects.

Paper Structure

This paper contains 28 sections, 2 theorems, 136 equations, 16 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

The diagonal part of the data $(Y_{ii})_{i\leq N}$ does not contribute to the MI in the high dimensional limit. Specifically, the inference problem has the same asymptotic mutual information density between data and signal as eq:channel0.

Figures (16)

  • Figure 1: The top illustrates the Ben Arous-Baik-Péché transition occurring in the rank-one spiked matrix model. In orange is the histogram of eigenvalues of ${\mathbf Y}$ when $M=1$. Before the transition it closely matches Wigner's semicircular law (blue) and its top eigenvalue (red line) sticks to the endpoint of the bulk of eigenvalues. All eigenvectors of ${\mathbf Y}$ have a vanishing $o_N(1)$ overlap with the hidden signal ${\mathbf X}$. The BBP transition is marked by an outlier eigenvalue detaching from the bulk. The associated eigenvector aligns non-trivially with ${\mathbf X}$ and can thus serve as spectral estimator. At the bottom is the phase diagram of the model considered from the Bayesian (information theoretic) perspective, with sparse prior $P_X= \rho \delta_0+\tau \delta_{-a}+(1-\tau-\rho) \delta_{b}$ for a proper choice of parameters (see, e.g., XXTmiolane2019fundamental). An impossible, hard, and easy inference phases appear for typical realisations of \ref{['eq:channel0']} as the signal strength increases, delimited by information theoretic and algorithmic transitions. The behavior of the Bayes-optimal, spectral, and Bayesian approximate message-passing algorithm are shown. For this example of prior $P_X$, the transitions for the message-passing and spectral algorithms match, but the Bayesian algorithm outperforms the spectral one as it exploits the prior, while the latter does not.
  • Figure 2: Monte Carlo results for the mutual information with Rademacher prior (left panel) and discrete prior $P_X=\frac{1}{4} \delta_{\sqrt{3}}+\frac{3}{4} \delta_{-1/\sqrt{3}}$ (right panel) for various sizes with $\alpha=0.5$, i.e., $M=N/2$. Error bars represent the standard error of the mean. The curves are compared to the exact infinite size limit \ref{['MIspherical']} in the case of standard Gaussian prior computed using the HCIZ integral. The red dashed curve correspond to Shannon entropy of the prior considered in each panel, which bounds the MI for all sizes from above.
  • Figure 3: Monte Carlo results for the MMSE for $N=10$ (leftmost figure), $20$ (middle) and $40$ (rightmost) with Rademacher prior (solid colored lines) and $\alpha=0.5$. They are compared to the MSE of the RIE for both Rademacher and Gaussian priors with corresponding $N$. Error bars for the RIE are omitted since they are too small to be visible. Error bars for the MMSE represent standard errors of the mean. Insets are in log-scale.
  • Figure 4: Monte Carlo results for the MMSE for $N=10$, $20$ and $40$ with prior $P_X=\frac{1}{4} \delta_{\sqrt{3}}+\frac{3}{4} \delta_{-1/\sqrt{3}}$ and $\alpha=0.5$. Error bars represent standard errors of the mean.
  • Figure 5: MSE of the Metropolis-Hastings algorithm for estimating ${\mathbf X}{\mathbf X}^\intercal$ with $(N,M)=(40,28)$, so $\alpha=0.7$, with each estimator (red $\circ$, blue $\triangle$ and black $\diamond$ markers) designated in the main text under "Experimental setup". Before the transition all markers overlap. These are compared to the RIE performance in the large system limit (blue line). Experimental markers are averaged over $36$ i.i.d. instances of the problem $({\mathbf X},{\mathbf Z})$.
  • ...and 11 more figures

Theorems & Definitions (3)

  • Proposition 1: Information irrelevance of the data diagonal components
  • proof
  • Proposition 2: Nishimori identity