Table of Contents
Fetching ...

Solving Inverse Problems by Joint Posterior Maximization with Autoencoding Prior

Mario González, Andrés Almansa, Pauline Tan

TL;DR

The paper tackles ill-posed imaging inverse problems by leveraging a variational autoencoder (VAE) prior and introducing Joint Posterior Maximization (JPMAP) over the image and latent code. It demonstrates that the joint objective is quasi-bi-convex, enabling an alternating optimization that converges to a stationary point under mild assumptions, while training the VAE with a denoising criterion improves robustness to out-of-distribution inputs. A continuation scheme further stabilizes the optimization and helps recover robust MAP estimates across diverse degradations. Empirical results on denoising, interpolation, deblurring, super-resolution, and compressed sensing show JPMAP achieving higher restoration quality than competing non-convex MAP methods, with insights on encoder initialization and scalability to higher-dimensional data via future generative-model improvements.

Abstract

In this work we address the problem of solving ill-posed inverse problems in imaging where the prior is a variational autoencoder (VAE). Specifically we consider the decoupled case where the prior is trained once and can be reused for many different log-concave degradation models without retraining. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algorithms, our approach computes the joint (space-latent) MAP that naturally leads to alternate optimization algorithms and to the use of a stochastic encoder to accelerate computations. The resulting technique (JPMAP) performs Joint Posterior Maximization using an Autoencoding Prior. We show theoretical and experimental evidence that the proposed objective function is quite close to bi-convex. Indeed it satisfies a weak bi-convexity property which is sufficient to guarantee that our optimization scheme converges to a stationary point. We also highlight the importance of correctly training the VAE using a denoising criterion, in order to ensure that the encoder generalizes well to out-of-distribution images, without affecting the quality of the generative model. This simple modification is key to providing robustness to the whole procedure. Finally we show how our joint MAP methodology relates to more common MAP approaches, and we propose a continuation scheme that makes use of our JPMAP algorithm to provide more robust MAP estimates. Experimental results also show the higher quality of the solutions obtained by our JPMAP approach with respect to other non-convex MAP approaches which more often get stuck in spurious local optima.

Solving Inverse Problems by Joint Posterior Maximization with Autoencoding Prior

TL;DR

The paper tackles ill-posed imaging inverse problems by leveraging a variational autoencoder (VAE) prior and introducing Joint Posterior Maximization (JPMAP) over the image and latent code. It demonstrates that the joint objective is quasi-bi-convex, enabling an alternating optimization that converges to a stationary point under mild assumptions, while training the VAE with a denoising criterion improves robustness to out-of-distribution inputs. A continuation scheme further stabilizes the optimization and helps recover robust MAP estimates across diverse degradations. Empirical results on denoising, interpolation, deblurring, super-resolution, and compressed sensing show JPMAP achieving higher restoration quality than competing non-convex MAP methods, with insights on encoder initialization and scalability to higher-dimensional data via future generative-model improvements.

Abstract

In this work we address the problem of solving ill-posed inverse problems in imaging where the prior is a variational autoencoder (VAE). Specifically we consider the decoupled case where the prior is trained once and can be reused for many different log-concave degradation models without retraining. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algorithms, our approach computes the joint (space-latent) MAP that naturally leads to alternate optimization algorithms and to the use of a stochastic encoder to accelerate computations. The resulting technique (JPMAP) performs Joint Posterior Maximization using an Autoencoding Prior. We show theoretical and experimental evidence that the proposed objective function is quite close to bi-convex. Indeed it satisfies a weak bi-convexity property which is sufficient to guarantee that our optimization scheme converges to a stationary point. We also highlight the importance of correctly training the VAE using a denoising criterion, in order to ensure that the encoder generalizes well to out-of-distribution images, without affecting the quality of the generative model. This simple modification is key to providing robustness to the whole procedure. Finally we show how our joint MAP methodology relates to more common MAP approaches, and we propose a continuation scheme that makes use of our JPMAP algorithm to provide more robust MAP estimates. Experimental results also show the higher quality of the solutions obtained by our JPMAP approach with respect to other non-convex MAP approaches which more often get stuck in spurious local optima.

Paper Structure

This paper contains 35 sections, 8 theorems, 87 equations, 10 figures, 5 algorithms.

Key Result

Proposition 2.1

\newlabelthm:convergence-approx0 Let $\left\lbrace ({\bm{x}}_{n},{\bm{z}}_{n}) \right\rbrace$ be a sequence generated by Algorithm alg:JPMAP3new. Under Assumption functioncondition we have that:

Figures (10)

  • Figure 1: Evaluating the quality of the generative model as a function of $\sigma_{\operatorname{DVAE}}$. On (a) Denoising (Gaussian noise $\sigma=150$), (b) Compressed Sensing ($\sim 10.2\%$ measurements, noise $\sigma=10$) and (c) Interpolation ($80\%$ of missing pixels, noise $\sigma=10$). Results of both algorithms are computed on a batch of 50 images and initialising on ground truth ${\bm{x}}^*$ (for CSGM we use ${\bm{z}}_0 = {\bm{\mu}}_{\phi}({\bm{x}}^*)$).
  • Figure 2: Evaluating the effectiveness of JPMAP vs CGSM as a function of $\sigma_{\operatorname{DVAE}}$ (same setup of Figure \ref{['fig:DVAE-generative-quality']}). Without a denoising criterion $\sigma_{\operatorname{DVAE}}=0$ the JPMAP algorithm may provide wrong guesses ${\bm{z}}^{1}$ when applying the encoder in step 2 of Algorithm \ref{['alg:JPMAP2new']}. For $\sigma_{\operatorname{DVAE}}>0$ however, the alternating minimization algorithm can benefit from the robust initialization heuristics provided by the encoder, and it consistently converges to a better local optimum than the simple gradient descent in CSGM.
  • Figure 3: Encoder approximation: (a) Contour plots of $-\log p_\theta({\bm{x}}|{\bm{z}}) +\frac{1}{2}\|{\bm{z}}\|^2$ and $-\log q_\phi({\bm{z}}|{\bm{x}})$ for a fixed ${\bm{x}}$ and for a random 2D subspace in the ${\bm{z}}$ domain (the plot shows $\pm 2 {\bm{\Sigma}}_{\phi}^{1/2}$ around ${\bm{\mu}}_{\phi}$). Observe the relatively small gap between the true posterior $p_\theta({\bm{z}}|{\bm{x}})$ and its variational approximation $q_\phi({\bm{z}}|{\bm{x}})$. This figure shows some evidence of partial ${\bm{z}}$-convexity of $J_1$ around the minimum of $J_2$, but it does not show how far is ${\bm{z}}^1$ from ${\bm{z}}^2$. (b) Decoded exact optimum ${\bm{x}}_1 = {\bm{\mu}}_{\theta}\left( \arg\max_{\bm{z}} p_\theta({\bm{x}}|{\bm{z}})e^{\frac{1}{2}\|{\bm{z}}\|^2} \right)$. (c) Decoded approximate optimum ${\bm{x}}_2 = {\bm{\mu}}_{\theta}\left( \arg\max_{\bm{z}} q_\phi({\bm{z}}|{\bm{x}}) \right)$. (d) Difference betweeen (b) and (c).
  • Figure 4: Effectiveness of the encoder approximation: We take ${\bm{x}}_0$ from the test set of MNIST and minimize $J_1({\bm{x}}_0,{\bm{z}})$ with respect to ${\bm{z}}$ using gradient descent from random Gaussian initializations ${\bm{z}}_0$. The blue thick curve represents the trajectory if we initialize at the encoder approximation ${\bm{z}}^1=\mathop{\mathrm{arg\,min}}\limits_{\bm{z}} J_2({\bm{x}}_0,{\bm{z}})={\bm{\mu}}_{\phi}({\bm{x}}_0)$. (a): Plots of the energy iterates $J_1({\bm{x}}_0,{\bm{z}}_k)$. (b): $\ell^2$ distances of each trajectory with respect to the global optimum ${\bm{z}}^*$. Conclusion: Observe that the encoder initialization allows much faster convergence both in energy and in ${\bm{z}}$, and it avoids the few random initializations that lead to a wrong stationary point different from the unique global minimizer.
  • Figure 5: Evolution of Algorithm \ref{['alg:MAPzCS']}. In this interpolation example, JPMAP starts with the initialization in $(a)$. During first iterations $(b)-(d)$ where $\beta_k$ is small, ${\bm{x}}_k$ and $\mathsf{G}({\bm{z}}_k)$ start loosely approaching each other at a coarse scale, and ${\bm{x}}_k$ only fills missing pixels with the ones of $\mathsf{G}({\bm{z}}_k)$ (in particular the noise of ${\bm{y}}$ is still present). By increasing $\beta_k$ in $(e)-(f)$ we enforce $\|\mathsf{G}({\bm{z}}_k)-{\bm{x}}_k\|^2 \leq \varepsilon$. Here we set $\epsilon=\left(\frac{3}{255}\right)^2 d$, that is, MSE of 3 gray levels.
  • ...and 5 more figures

Theorems & Definitions (14)

  • Proposition 2.1: Convergence of Algorithm \ref{['alg:JPMAP3new']}
  • Proof 1
  • Remark 2.2
  • Lemma A.1
  • Lemma A.2
  • Proof 2
  • Lemma A.3
  • Proposition B.1: map-${\bm{z}}$ estimator for deterministic generative models
  • Proposition C.1
  • Proof 3
  • ...and 4 more