Table of Contents
Fetching ...

On Generalization Bounds for Deep Compound Gaussian Neural Networks

Carter Lyons, Raghu G. Raj, Margaret Cheney

TL;DR

This work analyzes generalization in unrolled, compound Gaussian-prior neural networks for linear inverse problems. It develops a generalization error bound by bounding the Rademacher complexity through Dudley’s inequality, leveraging Lipschitz properties of the network’s scale-variable updates and Tikhonov solutions. The bound is specialized to two realizations, CG-Net and DR-CG-Net, revealing that DR-CG-Net provably incurs a tighter bound and scales more favorably with network size and signal dimension than CG-Net. The results provide theoretical guarantees for CG-informed unrolled networks, with implications for training efficiency on limited data and potential PAC-Bayes extensions for even stronger small-data guarantees.

Abstract

Algorithm unfolding or unrolling is the technique of constructing a deep neural network (DNN) from an iterative algorithm. Unrolled DNNs often provide better interpretability and superior empirical performance over standard DNNs in signal estimation tasks. An important theoretical question, which has only recently received attention, is the development of generalization error bounds for unrolled DNNs. These bounds deliver theoretical and practical insights into the performance of a DNN on empirical datasets that are distinct from, but sampled from, the probability density generating the DNN training data. In this paper, we develop novel generalization error bounds for a class of unrolled DNNs that are informed by a compound Gaussian prior. These compound Gaussian networks have been shown to outperform comparative standard and unfolded deep neural networks in compressive sensing and tomographic imaging problems. The generalization error bound is formulated by bounding the Rademacher complexity of the class of compound Gaussian network estimates with Dudley's integral. Under realistic conditions, we show that, at worst, the generalization error scales $\mathcal{O}(n\sqrt{\ln(n)})$ in the signal dimension and $\mathcal{O}(($Network Size$)^{3/2})$ in network size.

On Generalization Bounds for Deep Compound Gaussian Neural Networks

TL;DR

This work analyzes generalization in unrolled, compound Gaussian-prior neural networks for linear inverse problems. It develops a generalization error bound by bounding the Rademacher complexity through Dudley’s inequality, leveraging Lipschitz properties of the network’s scale-variable updates and Tikhonov solutions. The bound is specialized to two realizations, CG-Net and DR-CG-Net, revealing that DR-CG-Net provably incurs a tighter bound and scales more favorably with network size and signal dimension than CG-Net. The results provide theoretical guarantees for CG-informed unrolled networks, with implications for training efficiency on limited data and potential PAC-Bayes extensions for even stronger small-data guarantees.

Abstract

Algorithm unfolding or unrolling is the technique of constructing a deep neural network (DNN) from an iterative algorithm. Unrolled DNNs often provide better interpretability and superior empirical performance over standard DNNs in signal estimation tasks. An important theoretical question, which has only recently received attention, is the development of generalization error bounds for unrolled DNNs. These bounds deliver theoretical and practical insights into the performance of a DNN on empirical datasets that are distinct from, but sampled from, the probability density generating the DNN training data. In this paper, we develop novel generalization error bounds for a class of unrolled DNNs that are informed by a compound Gaussian prior. These compound Gaussian networks have been shown to outperform comparative standard and unfolded deep neural networks in compressive sensing and tomographic imaging problems. The generalization error bound is formulated by bounding the Rademacher complexity of the class of compound Gaussian network estimates with Dudley's integral. Under realistic conditions, we show that, at worst, the generalization error scales in the signal dimension and Network Size in network size.
Paper Structure (26 sections, 23 theorems, 116 equations, 1 figure, 1 algorithm)

This paper contains 26 sections, 23 theorems, 116 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{S} = \{(\overline{\bm{y}}_i, \overline{\bm{c}}_i)\}_{i = 1}^{N_s}$ be a training dataset where each $(\overline{\bm{c}}_i, \overline{\bm{y}}_i)$ is given by (eqn:linear_msrmt) and define $y_{\max} = \max_{1\leq i\leq N_s} \lVert\overline{\bm{y}}_i\rVert_2.$ If Assumption assumption:dat for $\dim(\mathcal{P}) = 1, n, 2n-1,$ or $n(n+1)/2$ when $\mathcal{P} = \mathcal{P}_{\textnormal{co

Figures (1)

  • Figure 1: End-to-end network structure for G-CG-Net, the unrolled deep neural network of Algorithm \ref{['alg:CG-LS']}, is shown in (\ref{['fig:DR-CG-Net']}). G-CG-Net consists of an input block, $L_0$, initialization block, $\mathcal{Z}_0$, $K+1$ Tikhonov blocks, $U_k$, output block, $O$, and $K$ complete scale variable mappings, $\mathcal{Z}_k$, with structure in (\ref{['fig:DR-CG-Net scale mapping module']}). Each $\mathcal{Z}_k$ consists of $J$ scale variable updates $Z_k^{(j)}$.

Theorems & Definitions (41)

  • Theorem 1: Generalization Error Bound for G-CG-Net
  • Theorem 2: Generalization Error Bound for CG-Net
  • Corollary 3
  • Theorem 4: Generalization Error Bound for DR-CG-Net
  • Corollary 5
  • Proposition 6
  • proof
  • Lemma 7
  • proof
  • Lemma 8
  • ...and 31 more