Table of Contents
Fetching ...

Fundamental limits of Non-Linear Low-Rank Matrix Estimation

Pierre Mergny, Justin Ko, Florent Krzakala, Lenka Zdeborová

TL;DR

It is shown that to reconstruct the signal accurately, one requires a signal-to-noise ratio growing as $N^{\frac 12 (1-1/k_F)}$, where $k_F$ is the first non-zero Fisher information coefficient of the function.

Abstract

We consider the task of estimating a low-rank matrix from non-linear and noisy observations. We prove a strong universality result showing that Bayes-optimal performances are characterized by an equivalent Gaussian model with an effective prior, whose parameters are entirely determined by an expansion of the non-linear function. In particular, we show that to reconstruct the signal accurately, one requires a signal-to-noise ratio growing as $N^{\frac 12 (1-1/k_F)}$, where $k_F$ is the first non-zero Fisher information coefficient of the function. We provide asymptotic characterization for the minimal achievable mean squared error (MMSE) and an approximate message-passing algorithm that reaches the MMSE under conditions analogous to the linear version of the problem. We also provide asymptotic errors achieved by methods such as principal component analysis combined with Bayesian denoising, and compare them with Bayes-optimal MMSE.

Fundamental limits of Non-Linear Low-Rank Matrix Estimation

TL;DR

It is shown that to reconstruct the signal accurately, one requires a signal-to-noise ratio growing as , where is the first non-zero Fisher information coefficient of the function.

Abstract

We consider the task of estimating a low-rank matrix from non-linear and noisy observations. We prove a strong universality result showing that Bayes-optimal performances are characterized by an equivalent Gaussian model with an effective prior, whose parameters are entirely determined by an expansion of the non-linear function. In particular, we show that to reconstruct the signal accurately, one requires a signal-to-noise ratio growing as , where is the first non-zero Fisher information coefficient of the function. We provide asymptotic characterization for the minimal achievable mean squared error (MMSE) and an approximate message-passing algorithm that reaches the MMSE under conditions analogous to the linear version of the problem. We also provide asymptotic errors achieved by methods such as principal component analysis combined with Bayesian denoising, and compare them with Bayes-optimal MMSE.
Paper Structure (33 sections, 34 theorems, 113 equations, 2 figures)

This paper contains 33 sections, 34 theorems, 113 equations, 2 figures.

Key Result

Theorem 3.1

Under Hyp. hyp:prior and hyp:bayes with the relevant scaling $\beta = \beta_{\mathrm{cr}}$ in Eq. eq:notation_rank1_mat, we have: where $\bm{Z} = (Z_{ij})_{ 1 \leqslant i,j \leqslant N}$ is a Wigner matrix with elements of variance $1/\Delta_{k_F}$.

Figures (2)

  • Figure 1: Matrix mean squared error for different estimators, for the model of Eq. \ref{['eq:example_kf2']} (left) and the model of Eq. \ref{['eq:example_kf3']} (right), as a function of the signal-to-noise parameter $\gamma_0$. Both figures illustrate the optimal performance achieved by the Approximate Message Passing (red). This is compared to the spectral method on the data matrix $\bm{Y}$ (green); the spectral method on the Fisher matrix $\bm{S}_{k_F}$ (cyan); and Gaussian denoising of the top eigenvector of $\bm{S}_{k_F}$ (purple). Note that the last method is very close to optimality. The empirical points (illustrated by dots in both figures) are obtained by doing an average of $30$ samples with $N=4000$. The phase transitions of the spectral methods are marked with vertical lines. For ${k_F}=2$ (left panel), there is no phase transition for the MMSE and AMP, since $m_{{k_F}=2} \neq 0$.
  • Figure 2: Difference between the largest and second largest eigenvalue (Eigengap) of $\bm{Y}$ and $\bm{S}_{k_F}$, as a function of the parameter $\gamma_0$ for the model of Eq. \ref{['eq:example_kf2']} (left) and the model of Eq. \ref{['eq:example_kf3']} (right). The threshold for the appearance of an outlier, that is the minimal value of $\gamma_0$ such that one has a positive value of the eigengap, is lower for the Fisher matrix than the one of the output matrix $\bm{Y}$ in both models. Each empirical point (dot) corresponds to one sample with $N=2000$. In the left figure, the output matrix is shifted by a constant to remove the non-informative Perron-Frobenius mode.

Theorems & Definitions (39)

  • Theorem 3.1: Informal - Universal Spiked Decomposition of the Fisher matrix
  • Theorem 3.2: Informal - Universal Top Eigenvector Decomposition of the Fisher Matrix
  • Definition 3.1: Free Energy
  • Theorem 3.3: Gaussian Approximation of the Free Energy
  • Corollary 3.1: Universality of the Limiting Free Energy
  • Definition 3.2: Matrix Mean Squared Error
  • Theorem 3.4: Universality of the Overlaps
  • Corollary 3.2: Universality of the MMSE
  • Proposition 3.1: Performance of PCA of the Fisher Matrix
  • Theorem 3.5: Informal - Optimality of PCA for Non-Linear Channels
  • ...and 29 more