Multi-layer State Evolution Under Random Convolutional Design

Mara Daniels; Cédric Gerbelot; Florent Krzakala; Lenka Zdeborová

Multi-layer State Evolution Under Random Convolutional Design

Mara Daniels, Cédric Gerbelot, Florent Krzakala, Lenka Zdeborová

TL;DR

The paper addresses recovering signals from multi-layer generative priors with convolutional layers by analyzing Multi-layer AMP (ML-AMP) under random MCC designs. It shows a universality result: the state evolution (SE) describing ML-AMP with MCC weights matches the SE for dense Gaussian weights up to a rescaling, achieved via a permutation-based embedding of MCCs into block-Gaussian structures and leveraging spatially coupled SE techniques. This yields precise performance predictions and justifies using structured, efficient MCCs in place of fully dense matrices, with empirical validation on sparse and multi-layer priors. The findings enable scalable, theoretically principled inference with convolutional priors and have practical impact for computational imaging and neural-prior-based recovery.

Abstract

Signal recovery under generative neural network priors has emerged as a promising direction in statistical inference and computational imaging. Theoretical analysis of reconstruction algorithms under generative priors is, however, challenging. For generative priors with fully connected layers and Gaussian i.i.d. weights, this was achieved by the multi-layer approximate message (ML-AMP) algorithm via a rigorous state evolution. However, practical generative priors are typically convolutional, allowing for computational benefits and inductive biases, and so the Gaussian i.i.d. weight assumption is very limiting. In this paper, we overcome this limitation and establish the state evolution of ML-AMP for random convolutional layers. We prove in particular that random convolutional layers belong to the same universality class as Gaussian matrices. Our proof technique is of an independent interest as it establishes a mapping between convolutional matrices and spatially coupled sensing matrices used in coding theory.

Multi-layer State Evolution Under Random Convolutional Design

TL;DR

Abstract

Paper Structure (21 sections, 5 theorems, 58 equations, 7 figures, 1 algorithm)

This paper contains 21 sections, 5 theorems, 58 equations, 7 figures, 1 algorithm.

Introduction
Related Work
Definition of the problem
Multi-channel Convolutional Matrices
Thermodynamic-like Limit and Finite-size Regimes
Multi-layer AMP
Computational Savings of MCC Matrices
Main result
Proof Sketch
Numerical Experiments
Discussion and Future Work
Proof of the main theorem
Notations and definitions
State evolution for generic multilayer AMP iterations with matrix valued variables and dense Gaussian matrices
State evolution for multilayer AMP iterations with random convolutional matrices
...and 6 more sections

Key Result

Theorem 4.2

Under the set of assumptions (A1)-(A4), for any sequences of uniformly pseudo-Lipschitz functions $\psi^{N}_{1},\psi^{N}_{2}$ of order $k$, for any $1 \leqslant l \leqslant L$ and any $t \in \mathbb{N}$, the following holds where $Z^{l}(t) \sim \mathcal{N}(0,\kappa^{l}(t))$, $\hat{Z}^{l}(t) \sim \mathcal{N}(0,\hat{\kappa}^{l}(t))$ are independent random variables.

Figures (7)

Figure 1: Agreement between the performance of the AMP algorithm run with random multichannel convolutional matrices and its state evolution as proven in this paper. (left) Compressive sensing $y_0 = W x_0 + \zeta$ for noise $\zeta_i \sim \mathcal{N}(0, 10^{-4})$ and signal prior $x_0 \sim \rho \mathcal{N}(0, 1) + (1-\rho) \delta(x)$, where $W \in \mathbb{R}^{Dq \times Pq}$ has varying aspect ratio $\beta = D / P$. Crosses correspond to AMP evaluations for $W \sim \text{MCC}(D, P, q, k)$ according to Definition \ref{['dfn:mcc']}, averaged over 10 independent trials. Dots correspond to AMP evaluations for $W \in \mathbb{R}^{D \times P}$ with i.i.d. Gaussian entries $W_{ij} \sim \mathcal{N}(0, 1/P)$. Lines show the state evolution predictions when $W_{ij} \sim \mathcal{N}(0,1/Pq)$. The system size is $P = 1024$, $q=1024$, $k=3$, where $\beta$ and $D = \beta P$ vary. While our theorem treats the limit $P, D \to \infty$, $q, k = O(1)$, we observe strong empirical agreement even when $q \sim P$. In Appendix \ref{['sec:q10-sparse-cs']} we give the same figure for $q=10 \ll P$. (right) AMP iterates at $\rho = 0.25$ and $\beta$ near the recovery transition. Rather than showing these models have equivalent fixed points, we show a stronger result: the state evolution equations are equivalent at each iteration.
Figure 2: MCC matrices operate on $Pq$ dimensional input data, composed of $q$-dimensional signals for each of $P$ separate channels. The $i$-th output channel is a linear combination of convolutional features extracted from input channels, where $k$ is the convolutional filter size: $y^{(i)} = \sum_{j = 1 \ldots P} C_{ij} x^{(j)}$. Blue boxes show linear dependencies between signal coordinates.
Figure 3: System sizes for convolutional layers in a DC-GAN architecture used to generate LSUN images radford2015unsupervised. These are not directly comparable to MCC matrices, as DCGAN uses fractionally strided convolutions, which can be thought of as a composition of an MCC matrix with superresolution. However, they give a reasonable picture of the sizes of typical layers in convolutional neural networks.
Figure 4: A sketch of the permutation lemma applied to matrix $W \sim \text{MCC}(4, 3, 3, 2)$. Left: $W$ before permutation. Right: after permutation, $U W \tilde{U}^T$.
Figure 5: ML-AMP compressive sensing recovery under multichannel convolutional designs (crossed) and the state evolution for the corresponding fully connected model (lined). For comparison, we also plot the corresponding fully connected AMP iterations (dotted), in which $W^{(l)} \in \mathbb{R}^{D_l \times P_l}$ with $W_{ij} \sim \mathcal{N}(0, 1/P_l)$, with the dimensions of the prior and output channel adjusted appropriately. Left: For $2 \leq l \leq L$, the channel functions are $\varphi^{(l)}(z; \zeta) = z + \zeta$ where $\zeta_i \sim \mathcal{N}(0, \sigma^2)$. Right: For $2 \leq l \leq L$, the channel functions are $\varphi^{(l)}(z; \zeta) = \max(z, 0)$ where the maximum is applied coordinatewise. This channel function is the popular ReLU activation function used by generative convolutional neural networks such as in radford2015unsupervisedbora2017compressed.
...and 2 more figures

Theorems & Definitions (14)

Definition 3.1: Gaussian i.i.d. Convolution
Definition 3.2: Multi-channel Gaussian i.i.d. Convolution
Definition 4.1: State Evolution
Theorem 4.2
Lemma 4.3: Permutation Lemma
proof
Definition A.1: pseudo-Lipschitz function
Theorem A.2
proof
Lemma A.3
...and 4 more

Multi-layer State Evolution Under Random Convolutional Design

TL;DR

Abstract

Multi-layer State Evolution Under Random Convolutional Design

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (14)