Table of Contents
Fetching ...

How Regularization Terms Make Invertible Neural Networks Bayesian Point Estimators

Nick Heilenkötter

TL;DR

This work shows that carefully designed regularization terms during the training of invertible neural networks can embed Bayesian priors into inverse problems, yielding forward-operator fidelity alongside principled, data-dependent reconstructions. A log-Jacobian-determinant penalty links to posterior-mean-like corrections via score-based denoising, while a divergence penalty induces MAP-like behavior through a prior-based normal equation. The authors provide rigorous derivations connecting these losses to classical estimators and validate the theory with 2D toy experiments using iResNets, demonstrating improved reconstruction quality and stable, interpretable data dependence. The results offer a principled pathway to incorporate Bayesian priors into forward and inverse maps learned by invertible architectures, with potential extensions to nonlinear and PDE-based settings and broader applications in imaging and inverse problems.

Abstract

Can regularization terms in the training of invertible neural networks lead to known Bayesian point estimators in reconstruction? Invertible networks are attractive for inverse problems due to their inherent stability and interpretability. Recently, optimization strategies for invertible neural networks that approximate either a reconstruction map or the forward operator have been studied from a Bayesian perspective, but each has limitations. To address this, we introduce and analyze two regularization terms for the network training that, upon inversion of the network, recover properties of classical Bayesian point estimators: while the first can be connected to the posterior mean, the second resembles the MAP estimator. Our theoretical analysis characterizes how each loss shapes both the learned forward operator and its inverse reconstruction map. Numerical experiments support our findings and demonstrate how these loss-term regularizers introduce data-dependence in a stable and interpretable way.

How Regularization Terms Make Invertible Neural Networks Bayesian Point Estimators

TL;DR

This work shows that carefully designed regularization terms during the training of invertible neural networks can embed Bayesian priors into inverse problems, yielding forward-operator fidelity alongside principled, data-dependent reconstructions. A log-Jacobian-determinant penalty links to posterior-mean-like corrections via score-based denoising, while a divergence penalty induces MAP-like behavior through a prior-based normal equation. The authors provide rigorous derivations connecting these losses to classical estimators and validate the theory with 2D toy experiments using iResNets, demonstrating improved reconstruction quality and stable, interpretable data dependence. The results offer a principled pathway to incorporate Bayesian priors into forward and inverse maps learned by invertible architectures, with potential extensions to nonlinear and PDE-based settings and broader applications in imaging and inverse problems.

Abstract

Can regularization terms in the training of invertible neural networks lead to known Bayesian point estimators in reconstruction? Invertible networks are attractive for inverse problems due to their inherent stability and interpretability. Recently, optimization strategies for invertible neural networks that approximate either a reconstruction map or the forward operator have been studied from a Bayesian perspective, but each has limitations. To address this, we introduce and analyze two regularization terms for the network training that, upon inversion of the network, recover properties of classical Bayesian point estimators: while the first can be connected to the posterior mean, the second resembles the MAP estimator. Our theoretical analysis characterizes how each loss shapes both the learned forward operator and its inverse reconstruction map. Numerical experiments support our findings and demonstrate how these loss-term regularizers introduce data-dependence in a stable and interpretable way.

Paper Structure

This paper contains 13 sections, 4 theorems, 39 equations, 5 figures.

Key Result

Lemma 1

Since we have $p(x|y^\delta)=p(x|z^\delta)$ for $z^\delta=A^\ast y^\delta$, the posterior mean for the original problem (problem:orig) and the normal equation (problem:normal) are equivalent. Further, the posterior mean point estimator satisfies

Figures (5)

  • Figure 1: How the reconstruction methods resulting from the studied training strategies map an equidistant grid (gray) when $A = \mathrm{Id}$: a) approximation training induces no regularization; b) reconstruction training approximates the posterior mean; c) log‑determinant regularization links to a smoothed posterior mean; d) divergence‑based regularization approximates the MAP estimator, visibly pulling toward the peaks of the prior.
  • Figure 2: The Bayesian inverse problem framework: The forward model maps $x\sim p_X$ to its measurement, which is corrupted by noise. Given this noisy data, the posterior density is computed as the product of prior and likelihood.
  • Figure 3: Comparison of MAP and PM for a Gaussian denoising task (i.e. $A=\mathrm{Id}$). The grid visualizes how the data space is deformed by the reconstruction methods. The arrows depict the values of the gradient field at the connected dots – for implicit Euler, we therefore plot the gradients at the reconstructed points.
  • Figure 4: Reconstruction and forward operator approximation error (including noise) on the 2d dataset for $\varepsilon=\frac{1}{8}$: the divergence-regularized loss obtains small errors in both directions.
  • Figure 5: Grids reconstructed by the optimized networks for $\varepsilon=\frac{1}{2}$ and $\hat{\delta}=\delta$, depicted alongside the numerically computed posterior mean and MAP estimators. (For optimal visibility of details, please view the figure electronically and zoom in as needed.)

Theorems & Definitions (8)

  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Remark 1
  • Theorem 2
  • proof
  • Corollary 1