Table of Contents
Fetching ...

On Kernel-based Variational Autoencoder

Tian Qin, Wei-Min Huang

TL;DR

This work addresses the limited expressiveness of Gaussian posteriors in VAEs by introducing a KDE-based posterior, and derives a computable upper bound on the KL term in the ELBO. It proves that the Epanechnikov kernel minimizes that bound asymptotically and implements EVAE using a location-scale reparameterization to sample from the KDE-based posterior. Empirically, EVAE yields improved reconstruction quality and sharper images on MNIST, Fashion-MNIST, CIFAR-10, and CelebA, particularly at higher latent dimensions, while maintaining competitive training times. The approach provides a principled, flexible alternative to Gaussian VAEs and establishes a bridge between KDE theory and variational inference, with potential extensions to tighter KL bounds and different kernel criteria.

Abstract

In this paper, we bridge Variational Autoencoders (VAEs) and kernel density estimations (KDEs) by approximating the posterior by KDEs and deriving an upper bound of the Kullback-Leibler (KL) divergence in the evidence lower bound (ELBO). The flexibility of KDEs makes the optimization of posteriors in VAEs possible, which not only addresses the limitations of Gaussian latent space in vanilla VAE but also provides a new perspective of estimating the KL-divergence in ELBO. Under appropriate conditions, we show that the Epanechnikov kernel is the optimal choice in minimizing the derived upper bound of KL-divergence asymptotically. Compared with Gaussian kernel, Epanechnikov kernel has compact support which should make the generated sample less noisy and blurry. The implementation of Epanechnikov kernel in ELBO is straightforward as it lies in the "location-scale" family of distributions where the reparametrization tricks can be directly employed. A series of experiments on benchmark datasets such as MNIST, Fashion-MNIST, CIFAR-10 and CelebA further demonstrate the superiority of Epanechnikov Variational Autoenocoder (EVAE) over vanilla VAE in the quality of reconstructed images, as measured by the FID score and Sharpness.

On Kernel-based Variational Autoencoder

TL;DR

This work addresses the limited expressiveness of Gaussian posteriors in VAEs by introducing a KDE-based posterior, and derives a computable upper bound on the KL term in the ELBO. It proves that the Epanechnikov kernel minimizes that bound asymptotically and implements EVAE using a location-scale reparameterization to sample from the KDE-based posterior. Empirically, EVAE yields improved reconstruction quality and sharper images on MNIST, Fashion-MNIST, CIFAR-10, and CelebA, particularly at higher latent dimensions, while maintaining competitive training times. The approach provides a principled, flexible alternative to Gaussian VAEs and establishes a bridge between KDE theory and variational inference, with potential extensions to tighter KL bounds and different kernel criteria.

Abstract

In this paper, we bridge Variational Autoencoders (VAEs) and kernel density estimations (KDEs) by approximating the posterior by KDEs and deriving an upper bound of the Kullback-Leibler (KL) divergence in the evidence lower bound (ELBO). The flexibility of KDEs makes the optimization of posteriors in VAEs possible, which not only addresses the limitations of Gaussian latent space in vanilla VAE but also provides a new perspective of estimating the KL-divergence in ELBO. Under appropriate conditions, we show that the Epanechnikov kernel is the optimal choice in minimizing the derived upper bound of KL-divergence asymptotically. Compared with Gaussian kernel, Epanechnikov kernel has compact support which should make the generated sample less noisy and blurry. The implementation of Epanechnikov kernel in ELBO is straightforward as it lies in the "location-scale" family of distributions where the reparametrization tricks can be directly employed. A series of experiments on benchmark datasets such as MNIST, Fashion-MNIST, CIFAR-10 and CelebA further demonstrate the superiority of Epanechnikov Variational Autoenocoder (EVAE) over vanilla VAE in the quality of reconstructed images, as measured by the FID score and Sharpness.
Paper Structure (35 sections, 2 theorems, 32 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 35 sections, 2 theorems, 32 equations, 8 figures, 7 tables, 2 algorithms.

Key Result

Theorem 3.1

Let assumptions $\textbf{A1}-\textbf{A4}$ in Appendix assumptions hold and suppose that the weight function a is integrable piecewise continuous and bounded. Suppose $b(n)=o(n^{-\frac{2}{9}})$ and $o(b(n))=n^{-\frac{1}{4}}(\text{log}(n))^{\frac{1}{2}}(\text{log}\text{log}n)^{\frac{1}{4}}$ as $n\to \

Figures (8)

  • Figure 1: (a) Sampled real images from hold-out samples in CIFAR-10 (b) Reconstructed images by VAE. (c) Reconstructed images by EVAE. Dimension $d_{z}=64$ for both models.
  • Figure 2: (a) Unconditional samples generated from VAE (MNIST dataset) (b) Unconditional samples generated from EVAE with B=0.1 (c) Unconditional samples generated from EVAE with B=1 (d) Unconditional samples generated from EVAE with B=10
  • Figure S3: Red curve: Standard Epanechnikov kernel. Green curve: Standard Gaussian kernel.
  • Figure S4: (a) Sampled real images from hold-out samples in MNIST (b) Reconstructed images by VAE. (c) Reconstructed images by EVAE. Dimension $d_{z}=64$ for both models. See section \ref{['Model architecture']} for Model architectures.
  • Figure S5: (a) Sampled real images from hold-out samples in Fashion-MNIST (b) Reconstructed images by VAE. (c) Reconstructed images by EVAE. Dimension $d_{z}=64$ for both models.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 3.1: Bickel & Rosenblatt Bickel
  • Lemma 3.2