Table of Contents
Fetching ...

Deep End-to-End Posterior ENergy (DEEPEN) for image recovery

Jyothi Rikhab Chand, Mathews Jacob

TL;DR

DEEPEN introduces an end-to-end learned energy-based model to represent the posterior in MRI image reconstruction, enabling both MAP estimation and posterior sampling. By modeling the prior with a neural energy ${\mathcal{E}}_{\boldsymbol{\theta}}$ and training via maximum likelihood, it yields a negative log-posterior ${\mathcal{L}}_{\boldsymbol{\theta}}({\boldsymbol{x}}) = \frac{1}{2}\|\mathbf{A}{\boldsymbol{x}}-\boldsymbol{b}\|^2 + \mathcal{E}_{\boldsymbol{\theta}}({\boldsymbol{x}}) + \log \tilde{Z}_{\boldsymbol{\theta}}$, and supports Langevin-based posterior sampling with an efficient gradient-based update. The method avoids algorithm unrolling, does not impose contraction constraints, and demonstrates improved MAP reconstruction over prior E2E and PnP approaches, plus faster sampling than diffusion models with substantially fewer parameters. Empirical results on fastMRI data show robust generalization to unseen acquisition settings, competitive reconstruction quality across acceleration factors, and meaningful uncertainty estimates from generated samples. Overall, DEEPEN provides a scalable, memory-efficient pathway to both high-quality image recovery and uncertainty quantification in MRI, with practical implications for adaptive acquisition and clinical decision-making.

Abstract

Current end-to-end (E2E) and plug-and-play (PnP) image reconstruction algorithms approximate the maximum a posteriori (MAP) estimate but cannot offer sampling from the posterior distribution, like diffusion models. By contrast, it is challenging for diffusion models to be trained in an E2E fashion. This paper introduces a Deep End-to-End Posterior ENergy (DEEPEN) framework, which enables MAP estimation as well as sampling. We learn the parameters of the posterior, which is the sum of the data consistency error and the negative log-prior distribution, using maximum likelihood optimization in an E2E fashion. The proposed approach does not require algorithm unrolling, and hence has a smaller computational and memory footprint than current E2E methods, while it does not require contraction constraints typically needed by current PnP methods. Our results demonstrate that DEEPEN offers improved performance than current E2E and PnP models in the MAP setting, while it also offers faster sampling compared to diffusion models. In addition, the learned energy-based model is observed to be more robust to changes in image acquisition settings.

Deep End-to-End Posterior ENergy (DEEPEN) for image recovery

TL;DR

DEEPEN introduces an end-to-end learned energy-based model to represent the posterior in MRI image reconstruction, enabling both MAP estimation and posterior sampling. By modeling the prior with a neural energy and training via maximum likelihood, it yields a negative log-posterior , and supports Langevin-based posterior sampling with an efficient gradient-based update. The method avoids algorithm unrolling, does not impose contraction constraints, and demonstrates improved MAP reconstruction over prior E2E and PnP approaches, plus faster sampling than diffusion models with substantially fewer parameters. Empirical results on fastMRI data show robust generalization to unseen acquisition settings, competitive reconstruction quality across acceleration factors, and meaningful uncertainty estimates from generated samples. Overall, DEEPEN provides a scalable, memory-efficient pathway to both high-quality image recovery and uncertainty quantification in MRI, with practical implications for adaptive acquisition and clinical decision-making.

Abstract

Current end-to-end (E2E) and plug-and-play (PnP) image reconstruction algorithms approximate the maximum a posteriori (MAP) estimate but cannot offer sampling from the posterior distribution, like diffusion models. By contrast, it is challenging for diffusion models to be trained in an E2E fashion. This paper introduces a Deep End-to-End Posterior ENergy (DEEPEN) framework, which enables MAP estimation as well as sampling. We learn the parameters of the posterior, which is the sum of the data consistency error and the negative log-prior distribution, using maximum likelihood optimization in an E2E fashion. The proposed approach does not require algorithm unrolling, and hence has a smaller computational and memory footprint than current E2E methods, while it does not require contraction constraints typically needed by current PnP methods. Our results demonstrate that DEEPEN offers improved performance than current E2E and PnP models in the MAP setting, while it also offers faster sampling compared to diffusion models. In addition, the learned energy-based model is observed to be more robust to changes in image acquisition settings.

Paper Structure

This paper contains 23 sections, 1 theorem, 23 equations, 5 figures, 2 tables.

Key Result

Lemma 2.1

Consider the cost function $\mathcal{L}_{\theta}({\boldsymbol{x}})$ in (eq:posterior_modeled), which is bounded below by zeroThe CNN implementation $\mathcal{E}_\theta({\boldsymbol{x}})$ has an absolute function in the output layer, which makes the lower bound zero.. Then the sequence of iterates $\

Figures (5)

  • Figure 1: Demonstration of training procedure of DEEPEN. The training procedure determines the optimal weights of the energy $E_{\boldsymbol \theta}(\cdot)$ by minimizing the energy difference between true and fake samples. The true samples ${\boldsymbol{x}}^+$ are obtained from the training data, while the fake samples ${\boldsymbol{x}}^-$ are generated using the Langevin sampling algorithm, highlighted by the yellow box. We note that the intermediate results are not stored to evaluate the loss's gradient; therefore, a single physical layer is used for forward propagation. This keeps the training memory demand low.
  • Figure 2: Comparison of pre-trained (MuSE) with E2E-trained (DEEPEN) energy models for (a) four-fold and (b) six-fold acceleration. The top row in each figure shows the original image, Gaussian noise, and the structural perturbation specified by ${\boldsymbol{x}}_{0}-{\boldsymbol{x}}$, where ${\boldsymbol{x}}_0$ denotes the sense solution. The second and third row in each figure shows the plot of MuSE and DEEPEN energy as a function of $\alpha_z$ and $\alpha_s$, their corresponding reconstructed and the error images, respectively. The images are reconstructed by taking the combination of the form $\hat{{\boldsymbol{x}}} = {\boldsymbol{x}} + \alpha_s^{*}{\boldsymbol{s}} + \alpha_z^{*}{\boldsymbol{z}}$, where $(\alpha_s^{*},\alpha_z^{*})$ are the minimizer (indicated by cross mark in the contour plot) of the energy function. We note that the minimum of the DEEPEN energy is closer to $\alpha_s^{*}\approx 0;\alpha_z^{*}=0$, with the differences ${\boldsymbol{x}}^*-{\boldsymbol{x}}$ smaller than that of MuSE. This shows that the DEEPEN energy is effective in suppressing both correlated structural perturbations as well as Gaussian noise.
  • Figure 3: Comparison of DEEPEN, ELDER, MuSE, PnP-ISTA, and MoL for three different acquisition settings on the fastMRI brain data set. The first, second, and the third row in each figure shows the reconstructed image, enlarged image, and the error image, respectively. The error image is scaled by a factor of 10 to highlight the differences. The first image in the third row shows the undersampling mask.
  • Figure 4: Generalization comparison of E2E-trained DEEPEN and ELDER models for two different settings: (a) models are trained on 4-fold 2D undersampling mask and tested on a 2-fold 1D undersampling mask. When compared to Fig. \ref{['mse_cmp']}.a, which demonstrates the reconstruction performance of models trained and tested on 4-fold 2D mask, the performance of DEEPEN and ELDER drops by $0.4$ dB and $1.85$ dB, respectively (b) models are trained on 6-fold 2D mask and tested on 2-fold 1D mask. When compared to Fig. \ref{['mse_cmp']}.b, which demonstrates the performance of models trained and tested on 6-fold 2D mask, the performance of DEEPEN improved by $1.38$ dB while the performance of ELDER dropped by $0.5$ dB.
  • Figure 5: MAP, MMSE, and uncertainty estimate provided by DEEPEN algorithm for two different MRI acquisition settings. We compare DEEPEN's sampling performance with DPS when 2D undersampling mask is employed for (a) 4-fold and (b) 6-fold acceleration.

Theorems & Definitions (1)

  • Lemma 2.1