Table of Contents
Fetching ...

Wavelet-based Variational Autoencoders for High-Resolution Image Generation

Andrew Kiruluta

TL;DR

This paper tackles blurriness in high-resolution VAE generations by replacing the traditional Gaussian latent space with a multi-scale Haar wavelet latent representation. It introduces a learnable noise scale and a sparsity-promoting regularization approach, along with adapting the reparameterization trick for wavelet coefficients and an IDWT-based decoder. Empirically, the Wavelet-VAE yields sharper reconstructions and better perceptual realism on CIFAR-10 (128×128) and CelebA-HQ (128–256×256) than a conventional VAE, demonstrated by improvements in reconstruction loss, SSIM, and FID. The work highlights improved interpretability of latent features and compatibility with other VAE extensions, suggesting broad potential for wavelet-based multi-scale generative modeling in high-fidelity image synthesis.

Abstract

Variational Autoencoders (VAEs) are powerful generative models capable of learning compact latent representations. However, conventional VAEs often generate relatively blurry images due to their assumption of an isotropic Gaussian latent space and constraints in capturing high-frequency details. In this paper, we explore a novel wavelet-based approach (Wavelet-VAE) in which the latent space is constructed using multi-scale Haar wavelet coefficients. We propose a comprehensive method to encode the image features into multi-scale detail and approximation coefficients and introduce a learnable noise parameter to maintain stochasticity. We thoroughly discuss how to reformulate the reparameterization trick, address the KL divergence term, and integrate wavelet sparsity principles into the training objective. Our experimental evaluation on CIFAR-10 and other high-resolution datasets demonstrates that the Wavelet-VAE improves visual fidelity and recovers higher-resolution details compared to conventional VAEs. We conclude with a discussion of advantages, potential limitations, and future research directions for wavelet-based generative modeling.

Wavelet-based Variational Autoencoders for High-Resolution Image Generation

TL;DR

This paper tackles blurriness in high-resolution VAE generations by replacing the traditional Gaussian latent space with a multi-scale Haar wavelet latent representation. It introduces a learnable noise scale and a sparsity-promoting regularization approach, along with adapting the reparameterization trick for wavelet coefficients and an IDWT-based decoder. Empirically, the Wavelet-VAE yields sharper reconstructions and better perceptual realism on CIFAR-10 (128×128) and CelebA-HQ (128–256×256) than a conventional VAE, demonstrated by improvements in reconstruction loss, SSIM, and FID. The work highlights improved interpretability of latent features and compatibility with other VAE extensions, suggesting broad potential for wavelet-based multi-scale generative modeling in high-fidelity image synthesis.

Abstract

Variational Autoencoders (VAEs) are powerful generative models capable of learning compact latent representations. However, conventional VAEs often generate relatively blurry images due to their assumption of an isotropic Gaussian latent space and constraints in capturing high-frequency details. In this paper, we explore a novel wavelet-based approach (Wavelet-VAE) in which the latent space is constructed using multi-scale Haar wavelet coefficients. We propose a comprehensive method to encode the image features into multi-scale detail and approximation coefficients and introduce a learnable noise parameter to maintain stochasticity. We thoroughly discuss how to reformulate the reparameterization trick, address the KL divergence term, and integrate wavelet sparsity principles into the training objective. Our experimental evaluation on CIFAR-10 and other high-resolution datasets demonstrates that the Wavelet-VAE improves visual fidelity and recovers higher-resolution details compared to conventional VAEs. We conclude with a discussion of advantages, potential limitations, and future research directions for wavelet-based generative modeling.

Paper Structure

This paper contains 34 sections, 15 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Qualitative comparisons between baseline VAE (top row) and Wavelet-VAE (bottom row). Noticeably sharper edges and more refined textures are present in the Wavelet-VAE reconstructions, illustrating improved image quality.
  • Figure 2: Heatmap of Haar wavelet coefficients extracted from the latent space of a Wavelet-VAE trained on CIFAR-10 images. Bright areas indicate regions of high importance for image reconstruction, while darker regions represent areas of lower significance.