Explicitly Minimizing the Blur Error of Variational Autoencoders
Gustav Bredell, Kyriakos Flouris, Krishna Chaitanya, Ertunc Erdil, Ender Konukoglu
TL;DR
The paper tackles blur in variational autoencoders by explicitly weighting reconstruction errors caused by blur while preserving the ELBO objective. It introduces a Wiener-deconvolution–based weighting in the Fourier domain, tying it to a Gaussian likelihood with a structured covariance $\Sigma_k$ that depends on a per-image kernel predicted by a neural network $G_{\gamma}(z)$. Determinants of the covariance are efficiently computed via circulant matrix properties, enabling tractable optimization with an alternating scheme that updates the VAE parameters and the kernel predictor. Empirical results on CelebA (64×64 and 256×256) and HCP MRI data show sharper reconstructions and improved perceptual metrics (LPIPS, FID) compared to standard and perceptual reconstructions, validating the approach's effectiveness and generality across domains.
Abstract
Variational autoencoders (VAEs) are powerful generative modelling methods, however they suffer from blurry generated samples and reconstructions compared to the images they have been trained on. Significant research effort has been spent to increase the generative capabilities by creating more flexible models but often flexibility comes at the cost of higher complexity and computational cost. Several works have focused on altering the reconstruction term of the evidence lower bound (ELBO), however, often at the expense of losing the mathematical link to maximizing the likelihood of the samples under the modeled distribution. Here we propose a new formulation of the reconstruction term for the VAE that specifically penalizes the generation of blurry images while at the same time still maximizing the ELBO under the modeled distribution. We show the potential of the proposed loss on three different data sets, where it outperforms several recently proposed reconstruction losses for VAEs.
