Unrolled denoising networks provably learn optimal Bayesian inference

Aayush Karan; Kulin Shah; Sitan Chen; Yonina C. Eldar

Unrolled denoising networks provably learn optimal Bayesian inference

Aayush Karan, Kulin Shah, Sitan Chen, Yonina C. Eldar

TL;DR

This work proves the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP), and proves that when trained on data drawn from a product prior, the layers of the network approximately converge to the same denoisers used in Bayes AMP.

Abstract

Much of Bayesian inference centers around the design of estimators for inverse problems which are optimal assuming the data comes from a known prior. But what do these optimality guarantees mean if the prior is unknown? In recent years, algorithm unrolling has emerged as deep learning's answer to this age-old question: design a neural network whose layers can in principle simulate iterations of inference algorithms and train on data generated by the unknown prior. Despite its empirical success, however, it has remained unclear whether this method can provably recover the performance of its optimal, prior-aware counterparts. In this work, we prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP). For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network approximately converge to the same denoisers used in Bayes AMP. We also provide extensive numerical experiments for compressed sensing and rank-one matrix estimation demonstrating the advantages of our unrolled architecture - in addition to being able to obliviously adapt to general priors, it exhibits improvements over Bayes AMP in more general settings of low dimensions, non-Gaussian designs, and non-product priors.

Unrolled denoising networks provably learn optimal Bayesian inference

TL;DR

Abstract

Paper Structure (39 sections, 10 theorems, 64 equations, 8 figures, 2 algorithms)

This paper contains 39 sections, 10 theorems, 64 equations, 8 figures, 2 algorithms.

Introduction
Related work
Theory for unrolling ISTA.
Unrolled AMP.
Other learning-based approaches.
Other theory for unrolling.
Preliminaries on Bayes AMP and unrolling
Compressed sensing
Rank-one matrix estimation
Unrolling Bayes AMP
Architecture
Compressed sensing.
Rank-one matrix estimation.
Training
Provably learning Bayes AMP
...and 24 more sections

Key Result

Theorem 1

For compressed sensing with Gaussian sensing matrix, if the prior on the signal is a product distribution with smooth, sub-Gaussian marginals, then an unrolled network based on AMP which is trained with gradient descent on polynomially many samples will converge in polynomially many iterations to an

Figures (8)

Figure 1: LDNet for Compressed Sensing. On the left, we plot the NMSE (in dB) obtained by LDNet and Bayes AMP baselines on the Bernoulli-Gaussian prior. On the right, we plot NMSE (not in dB) achieved on the $\mathbb{Z}_2$ prior. LDNet (along with the guided denoisers) achieves virtually identical performance to the conjectured computationally optimal Bayes AMP.
Figure 2: Learned Denoisers for Compressed Sensing. We plot layerwise denoising functions learned by LDNet on the Bernoulli-Gaussian and $\mathbb{Z}_2$ priors relative to their optimal denoisers over a range of inputs in $(-2, 2)$. The state evolution input $\tau_{\ell}$ to each denoiser is set to be its empirical estimate.
Figure 3: Learned B with Decreasing Dimension. We hold $\delta = \frac{1}{2}$ fixed while scaling $m$ from $200$ down to $100$. Plots show NMSE (dB) performance of unrolling denoisers and learning $\mathbf{B}$ vs. Bayes AMP for randomly drawn measurement matrices. There is an increasing gap in performance as $m$ decreases.
Figure 4: Non-Gaussian Measurements. On the left, we plot LDNet with learnable $B$ compared to several baselines for a random truncated orthogonal measurement matrix, and on the right, for a random truncated Gram matrix. LDNet outperforms the other baselines in NMSE as well as convergence.
Figure 5: LDNet for Rank-One Matrix Estimation. On the left, we plot the NMSE obtained by LDNet and Bayes AMP on the Gaussian prior, while the right plots are on $\mathbb{Z}_2$. LDNet matches Bayes AMP with a slightly quicker convergence.
...and 3 more figures

Theorems & Definitions (17)

Theorem 1: Informal, see Theorem \ref{['thm:formal-thm']}
Lemma 1: Asymptotic characterization of AMP iterates bayati2011dynamics
Definition 1: Scalar function complexity allen2019learning
Theorem 2
Lemma 2: Learning the denoiser within $L_2$ error
Lemma 3: Theorem 1 of allen2019learning
proof : Proof of Lemma \ref{['lem:l2-error-learning-finite-dim']}
Lemma 4
proof
Lemma 5
...and 7 more

Unrolled denoising networks provably learn optimal Bayesian inference

TL;DR

Abstract

Unrolled denoising networks provably learn optimal Bayesian inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (17)