EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders

Gulcin Baykal; Melih Kandemir; Gozde Unal

EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders

Gulcin Baykal, Melih Kandemir, Gozde Unal

TL;DR

EdVAE tackles codebook collapse in discrete VAEs by introducing an evidential, uncertainty-aware hierarchical framework that replaces softmax with a Dirichlet-Categorical structure. By modeling concentration parameters as functions of encoder evidences, it achieves more diverse codebook usage and improved reconstruction. Across CIFAR10, CelebA, and LSUN Church, EdVAE shows higher perplexity and lower MSE than baselines, often rivaling or surpassing state-of-the-art VQ-VAE variants. The approach provides a principled uncertainty-aware mechanism with practical benefits for discrete latent representations in generative modeling.

Abstract

Codebook collapse is a common problem in training deep generative models with discrete representation spaces like Vector Quantized Variational Autoencoders (VQ-VAEs). We observe that the same problem arises for the alternatively designed discrete variational autoencoders (dVAEs) whose encoder directly learns a distribution over the codebook embeddings to represent the data. We hypothesize that using the softmax function to obtain a probability distribution causes the codebook collapse by assigning overconfident probabilities to the best matching codebook elements. In this paper, we propose a novel way to incorporate evidential deep learning (EDL) instead of softmax to combat the codebook collapse problem of dVAE. We evidentially monitor the significance of attaining the probability distribution over the codebook embeddings, in contrast to softmax usage. Our experiments using various datasets show that our model, called EdVAE, mitigates codebook collapse while improving the reconstruction performance, and enhances the codebook usage compared to dVAE and VQ-VAE based models. Our code can be found at https://github.com/ituvisionlab/EdVAE .

EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders

TL;DR

Abstract

Paper Structure (24 sections, 18 equations, 10 figures, 6 tables, 2 algorithms)

This paper contains 24 sections, 18 equations, 10 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Background
Discrete Variational Autoencoders
Hierarchical Bayesian Models
Evidential Deep Learning as a Hierarchical Bayesian Model
Method
EdVAE Design
Experiments
Experimental Settings
Evaluations
Effects of the softmax distribution
Uncertainty vs Perplexity
Perplexity and reconstruction performance
Effects of codebook design
...and 9 more sections

Figures (10)

Figure 1: Overview of the method. An illustrative codebook is defined as $\mathcal{M} \in R^{8\times 4}$ where 8 is the number of the codebook embeddings, 4 is the dimensionality of each embedding. For each 16 spatial positions in $z_e(x)$ where $N$ is 4, we define a Dirichlet prior over the parameters of the Categorical distributions which models the codebook embedding assignment to each spatial position.
Figure 2: Entropy visualization of the probability distributions for CIFAR10.
Figure 3: EdVAE training on CIFAR10: perplexity values increase during the training due to the increase in uncertainty values.
Figure 4: Reconstructions from (a) CIFAR10, (b) CelebA.
Figure 5: Reconstructions from LSUN Church.
...and 5 more figures

EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders

TL;DR

Abstract

EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders

Authors

TL;DR

Abstract

Table of Contents

Figures (10)