Table of Contents
Fetching ...

Explicitly Minimizing the Blur Error of Variational Autoencoders

Gustav Bredell, Kyriakos Flouris, Krishna Chaitanya, Ertunc Erdil, Ender Konukoglu

TL;DR

The paper tackles blur in variational autoencoders by explicitly weighting reconstruction errors caused by blur while preserving the ELBO objective. It introduces a Wiener-deconvolution–based weighting in the Fourier domain, tying it to a Gaussian likelihood with a structured covariance $\Sigma_k$ that depends on a per-image kernel predicted by a neural network $G_{\gamma}(z)$. Determinants of the covariance are efficiently computed via circulant matrix properties, enabling tractable optimization with an alternating scheme that updates the VAE parameters and the kernel predictor. Empirical results on CelebA (64×64 and 256×256) and HCP MRI data show sharper reconstructions and improved perceptual metrics (LPIPS, FID) compared to standard and perceptual reconstructions, validating the approach's effectiveness and generality across domains.

Abstract

Variational autoencoders (VAEs) are powerful generative modelling methods, however they suffer from blurry generated samples and reconstructions compared to the images they have been trained on. Significant research effort has been spent to increase the generative capabilities by creating more flexible models but often flexibility comes at the cost of higher complexity and computational cost. Several works have focused on altering the reconstruction term of the evidence lower bound (ELBO), however, often at the expense of losing the mathematical link to maximizing the likelihood of the samples under the modeled distribution. Here we propose a new formulation of the reconstruction term for the VAE that specifically penalizes the generation of blurry images while at the same time still maximizing the ELBO under the modeled distribution. We show the potential of the proposed loss on three different data sets, where it outperforms several recently proposed reconstruction losses for VAEs.

Explicitly Minimizing the Blur Error of Variational Autoencoders

TL;DR

The paper tackles blur in variational autoencoders by explicitly weighting reconstruction errors caused by blur while preserving the ELBO objective. It introduces a Wiener-deconvolution–based weighting in the Fourier domain, tying it to a Gaussian likelihood with a structured covariance that depends on a per-image kernel predicted by a neural network . Determinants of the covariance are efficiently computed via circulant matrix properties, enabling tractable optimization with an alternating scheme that updates the VAE parameters and the kernel predictor. Empirical results on CelebA (64×64 and 256×256) and HCP MRI data show sharper reconstructions and improved perceptual metrics (LPIPS, FID) compared to standard and perceptual reconstructions, validating the approach's effectiveness and generality across domains.

Abstract

Variational autoencoders (VAEs) are powerful generative modelling methods, however they suffer from blurry generated samples and reconstructions compared to the images they have been trained on. Significant research effort has been spent to increase the generative capabilities by creating more flexible models but often flexibility comes at the cost of higher complexity and computational cost. Several works have focused on altering the reconstruction term of the evidence lower bound (ELBO), however, often at the expense of losing the mathematical link to maximizing the likelihood of the samples under the modeled distribution. Here we propose a new formulation of the reconstruction term for the VAE that specifically penalizes the generation of blurry images while at the same time still maximizing the ELBO under the modeled distribution. We show the potential of the proposed loss on three different data sets, where it outperforms several recently proposed reconstruction losses for VAEs.
Paper Structure (16 sections, 12 equations, 9 figures, 5 tables)

This paper contains 16 sections, 12 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Sketch of the proposed approach for minimizing the blur error in VAEs. We illustrate the two losses that we minimize in the alternating optimization and the corresponding network parameters that are updated in different colors.
  • Figure 2: The evolution of the estimated kernel, $G_{\gamma}(z)$, is shown four different different images at different epochs during training. The blur minimizing reconstruction term is introduced at epoch 10, after which a strong decrease of the estimated blur kernel can be observed along with sharper image reconstruction. In addition, it can be seen that different images have different blur kernel estimates, which motivates determining the blur kernel per image and making $\Sigma$ dependent on $z$.
  • Figure 3: Here, we present the qualitative results of reconstructions and generations from the proposed method and relevant compared methods on CelebA256 dataset. We observe lower blurriness and higher sharpness for the reconstructed images from the proposed method.
  • Figure 4: Here, we present the qualitative results of reconstructions and generations from the proposed method and relevant compared methods on HCP medical dataset. We observe lower blurriness and higher sharpness for the reconstructed images from the proposed method.
  • Figure 5: Here, we present the qualitative results of reconstructions and generations from the proposed method and relevant compared methods on CIFAR10 dataset (krizhevsky2009learning). We observe lower blurriness and higher sharpness for the reconstructed images from the proposed method.
  • ...and 4 more figures