On Kernel-based Variational Autoencoder

Tian Qin; Wei-Min Huang

On Kernel-based Variational Autoencoder

Tian Qin, Wei-Min Huang

TL;DR

This work addresses the limited expressiveness of Gaussian posteriors in VAEs by introducing a KDE-based posterior, and derives a computable upper bound on the KL term in the ELBO. It proves that the Epanechnikov kernel minimizes that bound asymptotically and implements EVAE using a location-scale reparameterization to sample from the KDE-based posterior. Empirically, EVAE yields improved reconstruction quality and sharper images on MNIST, Fashion-MNIST, CIFAR-10, and CelebA, particularly at higher latent dimensions, while maintaining competitive training times. The approach provides a principled, flexible alternative to Gaussian VAEs and establishes a bridge between KDE theory and variational inference, with potential extensions to tighter KL bounds and different kernel criteria.

Abstract

In this paper, we bridge Variational Autoencoders (VAEs) and kernel density estimations (KDEs) by approximating the posterior by KDEs and deriving an upper bound of the Kullback-Leibler (KL) divergence in the evidence lower bound (ELBO). The flexibility of KDEs makes the optimization of posteriors in VAEs possible, which not only addresses the limitations of Gaussian latent space in vanilla VAE but also provides a new perspective of estimating the KL-divergence in ELBO. Under appropriate conditions, we show that the Epanechnikov kernel is the optimal choice in minimizing the derived upper bound of KL-divergence asymptotically. Compared with Gaussian kernel, Epanechnikov kernel has compact support which should make the generated sample less noisy and blurry. The implementation of Epanechnikov kernel in ELBO is straightforward as it lies in the "location-scale" family of distributions where the reparametrization tricks can be directly employed. A series of experiments on benchmark datasets such as MNIST, Fashion-MNIST, CIFAR-10 and CelebA further demonstrate the superiority of Epanechnikov Variational Autoenocoder (EVAE) over vanilla VAE in the quality of reconstructed images, as measured by the FID score and Sharpness.

On Kernel-based Variational Autoencoder

TL;DR

Abstract

Paper Structure (35 sections, 2 theorems, 32 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 35 sections, 2 theorems, 32 equations, 8 figures, 7 tables, 2 algorithms.

Introduction
Approximate posterior $q_{\phi}(\mathbf{z}|\mathbf{x})$:
Prior distribution $p(\mathbf{z})$ of latent variable $\mathbf{z}$:
Preliminary
VAE formulation
Model the posterior as the expectation of kernel density estimator
Choice of kernel
Epanechnikov VAE
Experiments
Benchmark datasets
Reconstruction samples
Unconditional samples
Comparisons with baselines
Extra experiments
Discussion and limitation
...and 20 more sections

Key Result

Theorem 3.1

Let assumptions $\textbf{A1}-\textbf{A4}$ in Appendix assumptions hold and suppose that the weight function a is integrable piecewise continuous and bounded. Suppose $b(n)=o(n^{-\frac{2}{9}})$ and $o(b(n))=n^{-\frac{1}{4}}(\text{log}(n))^{\frac{1}{2}}(\text{log}\text{log}n)^{\frac{1}{4}}$ as $n\to \

Figures (8)

Figure 1: (a) Sampled real images from hold-out samples in CIFAR-10 (b) Reconstructed images by VAE. (c) Reconstructed images by EVAE. Dimension $d_{z}=64$ for both models.
Figure 2: (a) Unconditional samples generated from VAE (MNIST dataset) (b) Unconditional samples generated from EVAE with B=0.1 (c) Unconditional samples generated from EVAE with B=1 (d) Unconditional samples generated from EVAE with B=10
Figure S3: Red curve: Standard Epanechnikov kernel. Green curve: Standard Gaussian kernel.
Figure S4: (a) Sampled real images from hold-out samples in MNIST (b) Reconstructed images by VAE. (c) Reconstructed images by EVAE. Dimension $d_{z}=64$ for both models. See section \ref{['Model architecture']} for Model architectures.
Figure S5: (a) Sampled real images from hold-out samples in Fashion-MNIST (b) Reconstructed images by VAE. (c) Reconstructed images by EVAE. Dimension $d_{z}=64$ for both models.
...and 3 more figures

Theorems & Definitions (2)

Theorem 3.1: Bickel & Rosenblatt Bickel
Lemma 3.2

On Kernel-based Variational Autoencoder

TL;DR

Abstract

On Kernel-based Variational Autoencoder

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)