Table of Contents
Fetching ...

Gaussian Mixture Vector Quantization with Aggregated Categorical Posterior

Mingyuan Yan, Jiawei Wu, Rushi Shah, Dianbo Liu

TL;DR

GM-VQ introduces a Gaussian mixture prior over latent variables in a VQ-VAE framework and couples discrete codewords with continuous noise through a shared codebook. It derives Aggregated Categorical Posterior Evidence Lower Bound (ALBO) to align variational distributions with the generative model while remaining compatible with Gumbel-Softmax gradient estimates. Empirical results on CIFAR-10 and CelebA show improved reconstruction accuracy and much higher codebook utilization, particularly when applying entropy regularization, illustrating a robust discrete-continuous latent representation. This principled approach reduces training instability and heuristic dependence, enabling more effective tokenization and latent capacity usage in generative modeling.

Abstract

The vector quantization is a widely used method to map continuous representation to discrete space and has important application in tokenization for generative mode, bottlenecking information and many other tasks in machine learning. Vector Quantized Variational Autoencoder (VQ-VAE) is a type of variational autoencoder using discrete embedding as latent. We generalize the technique further, enriching the probabilistic framework with a Gaussian mixture as the underlying generative model. This framework leverages a codebook of latent means and adaptive variances to capture complex data distributions. This principled framework avoids various heuristics and strong assumptions that are needed with the VQ-VAE to address training instability and to improve codebook utilization. This approach integrates the benefits of both discrete and continuous representations within a variational Bayesian framework. Furthermore, by introducing the \textit{Aggregated Categorical Posterior Evidence Lower Bound} (ALBO), we offer a principled alternative optimization objective that aligns variational distributions with the generative model. Our experiments demonstrate that GM-VQ improves codebook utilization and reduces information loss without relying on handcrafted heuristics.

Gaussian Mixture Vector Quantization with Aggregated Categorical Posterior

TL;DR

GM-VQ introduces a Gaussian mixture prior over latent variables in a VQ-VAE framework and couples discrete codewords with continuous noise through a shared codebook. It derives Aggregated Categorical Posterior Evidence Lower Bound (ALBO) to align variational distributions with the generative model while remaining compatible with Gumbel-Softmax gradient estimates. Empirical results on CIFAR-10 and CelebA show improved reconstruction accuracy and much higher codebook utilization, particularly when applying entropy regularization, illustrating a robust discrete-continuous latent representation. This principled approach reduces training instability and heuristic dependence, enabling more effective tokenization and latent capacity usage in generative modeling.

Abstract

The vector quantization is a widely used method to map continuous representation to discrete space and has important application in tokenization for generative mode, bottlenecking information and many other tasks in machine learning. Vector Quantized Variational Autoencoder (VQ-VAE) is a type of variational autoencoder using discrete embedding as latent. We generalize the technique further, enriching the probabilistic framework with a Gaussian mixture as the underlying generative model. This framework leverages a codebook of latent means and adaptive variances to capture complex data distributions. This principled framework avoids various heuristics and strong assumptions that are needed with the VQ-VAE to address training instability and to improve codebook utilization. This approach integrates the benefits of both discrete and continuous representations within a variational Bayesian framework. Furthermore, by introducing the \textit{Aggregated Categorical Posterior Evidence Lower Bound} (ALBO), we offer a principled alternative optimization objective that aligns variational distributions with the generative model. Our experiments demonstrate that GM-VQ improves codebook utilization and reduces information loss without relying on handcrafted heuristics.

Paper Structure

This paper contains 23 sections, 19 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of GM-VQ. First, the encoder deterministically maps the input to proxy latents, which are then used to retrieve corresponding codewords from the codebook and generate noise. The codewords and noise are then combined to form the continuous latents. Finally, these continuous latents are passed through the decoder to produce the final output.
  • Figure 2: Probabilistic Graphical Model depicting the Gaussian Mixture Vector Quantization (GM-VQ) for the generative model (left) and the inference model (right). The codebook $\mathbf{M}$ plays a dual role, being shared between both the generative and inference models.
  • Figure 3: Gradient Bias vs. Entropy Relationship
  • Figure 4: Box plots showing the impact of entropy regularization on reconstruction quality (MSE) and codebook utilization (Perplexity) for the GM-VQ model. The left panel demonstrates the general trend of decreasing MSE with increasing entropy, the right pane shows the rise in perplexity with higher perplexity.