Fundamental limits of Non-Linear Low-Rank Matrix Estimation

Pierre Mergny; Justin Ko; Florent Krzakala; Lenka Zdeborová

Fundamental limits of Non-Linear Low-Rank Matrix Estimation

Pierre Mergny, Justin Ko, Florent Krzakala, Lenka Zdeborová

TL;DR

It is shown that to reconstruct the signal accurately, one requires a signal-to-noise ratio growing as $N^{\frac 12 (1-1/k_F)}$, where $k_F$ is the first non-zero Fisher information coefficient of the function.

Abstract

We consider the task of estimating a low-rank matrix from non-linear and noisy observations. We prove a strong universality result showing that Bayes-optimal performances are characterized by an equivalent Gaussian model with an effective prior, whose parameters are entirely determined by an expansion of the non-linear function. In particular, we show that to reconstruct the signal accurately, one requires a signal-to-noise ratio growing as $N^{\frac 12 (1-1/k_F)}$, where $k_F$ is the first non-zero Fisher information coefficient of the function. We provide asymptotic characterization for the minimal achievable mean squared error (MMSE) and an approximate message-passing algorithm that reaches the MMSE under conditions analogous to the linear version of the problem. We also provide asymptotic errors achieved by methods such as principal component analysis combined with Bayesian denoising, and compare them with Bayes-optimal MMSE.

Fundamental limits of Non-Linear Low-Rank Matrix Estimation

TL;DR

It is shown that to reconstruct the signal accurately, one requires a signal-to-noise ratio growing as

, where

is the first non-zero Fisher information coefficient of the function.

Abstract

, where

is the first non-zero Fisher information coefficient of the function. We provide asymptotic characterization for the minimal achievable mean squared error (MMSE) and an approximate message-passing algorithm that reaches the MMSE under conditions analogous to the linear version of the problem. We also provide asymptotic errors achieved by methods such as principal component analysis combined with Bayesian denoising, and compare them with Bayes-optimal MMSE.

Paper Structure (33 sections, 34 theorems, 113 equations, 2 figures)

This paper contains 33 sections, 34 theorems, 113 equations, 2 figures.

Introduction and Related Work
Main Contributions ---
Further Related works ---
Model and Assumptions
Notations
Structure of the Model
Definitions
Assumptions
Examples
Main results
Decomposition of the Fisher Matrix and Its Top Eigenvector
Information-Theoretic Limit: Higher-Order Universality of the Free Energy and MMSE.
Higher-Order Universality of the Free Energy ---
Higher-Order Universality of the Overlap and MMSE ---
Algorithmic Limit: Higher-Order Universality of Spectral Methods and AMP
...and 18 more sections

Key Result

Theorem 3.1

Under Hyp. hyp:prior and hyp:bayes with the relevant scaling $\beta = \beta_{\mathrm{cr}}$ in Eq. eq:notation_rank1_mat, we have: where $\bm{Z} = (Z_{ij})_{ 1 \leqslant i,j \leqslant N}$ is a Wigner matrix with elements of variance $1/\Delta_{k_F}$.

Figures (2)

Figure 1: Matrix mean squared error for different estimators, for the model of Eq. \ref{['eq:example_kf2']} (left) and the model of Eq. \ref{['eq:example_kf3']} (right), as a function of the signal-to-noise parameter $\gamma_0$. Both figures illustrate the optimal performance achieved by the Approximate Message Passing (red). This is compared to the spectral method on the data matrix $\bm{Y}$ (green); the spectral method on the Fisher matrix $\bm{S}_{k_F}$ (cyan); and Gaussian denoising of the top eigenvector of $\bm{S}_{k_F}$ (purple). Note that the last method is very close to optimality. The empirical points (illustrated by dots in both figures) are obtained by doing an average of $30$ samples with $N=4000$. The phase transitions of the spectral methods are marked with vertical lines. For ${k_F}=2$ (left panel), there is no phase transition for the MMSE and AMP, since $m_{{k_F}=2} \neq 0$.
Figure 2: Difference between the largest and second largest eigenvalue (Eigengap) of $\bm{Y}$ and $\bm{S}_{k_F}$, as a function of the parameter $\gamma_0$ for the model of Eq. \ref{['eq:example_kf2']} (left) and the model of Eq. \ref{['eq:example_kf3']} (right). The threshold for the appearance of an outlier, that is the minimal value of $\gamma_0$ such that one has a positive value of the eigengap, is lower for the Fisher matrix than the one of the output matrix $\bm{Y}$ in both models. Each empirical point (dot) corresponds to one sample with $N=2000$. In the left figure, the output matrix is shifted by a constant to remove the non-informative Perron-Frobenius mode.

Theorems & Definitions (39)

Theorem 3.1: Informal - Universal Spiked Decomposition of the Fisher matrix
Theorem 3.2: Informal - Universal Top Eigenvector Decomposition of the Fisher Matrix
Definition 3.1: Free Energy
Theorem 3.3: Gaussian Approximation of the Free Energy
Corollary 3.1: Universality of the Limiting Free Energy
Definition 3.2: Matrix Mean Squared Error
Theorem 3.4: Universality of the Overlaps
Corollary 3.2: Universality of the MMSE
Proposition 3.1: Performance of PCA of the Fisher Matrix
Theorem 3.5: Informal - Optimality of PCA for Non-Linear Channels
...and 29 more

Fundamental limits of Non-Linear Low-Rank Matrix Estimation

TL;DR

Abstract

Fundamental limits of Non-Linear Low-Rank Matrix Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (39)