Table of Contents
Fetching ...

A Non-negative VAE:the Generalized Gamma Belief Network

Zhibin Duan, Tiansheng Wen, Muyao Wang, Bo Chen, Mingyuan Zhou

TL;DR

This paper tackles the limited expressivity of Gamma Belief Networks due to their linear decoder by introducing the Generalized Gamma Belief Network, which uses a hierarchical non-linear generative model and sparse, non-negative gamma latents. It develops an upward-downward variational inference framework based on a Weibull posterior to approximate intractable gamma conditionals and jointly optimizes the generative model and inference network. Empirical results show expressivity on par with state-of-the-art Gaussian VAEs and strong disentanglement without extra regularizers, highlighting the benefits of sparse gamma latents for interpretability. The approach demonstrates robust performance across text and image benchmarks, suggesting broad applicability to complex data while preserving interpretability advantages of gamma latent variables.

Abstract

The gamma belief network (GBN), often regarded as a deep topic model, has demonstrated its potential for uncovering multi-layer interpretable latent representations in text data. Its notable capability to acquire interpretable latent factors is partially attributed to sparse and non-negative gamma-distributed latent variables. However, the existing GBN and its variations are constrained by the linear generative model, thereby limiting their expressiveness and applicability. To address this limitation, we introduce the generalized gamma belief network (Generalized GBN) in this paper, which extends the original linear generative model to a more expressive non-linear generative model. Since the parameters of the Generalized GBN no longer possess an analytic conditional posterior, we further propose an upward-downward Weibull inference network to approximate the posterior distribution of the latent variables. The parameters of both the generative model and the inference network are jointly trained within the variational inference framework. Finally, we conduct comprehensive experiments on both expressivity and disentangled representation learning tasks to evaluate the performance of the Generalized GBN against state-of-the-art Gaussian variational autoencoders serving as baselines.

A Non-negative VAE:the Generalized Gamma Belief Network

TL;DR

This paper tackles the limited expressivity of Gamma Belief Networks due to their linear decoder by introducing the Generalized Gamma Belief Network, which uses a hierarchical non-linear generative model and sparse, non-negative gamma latents. It develops an upward-downward variational inference framework based on a Weibull posterior to approximate intractable gamma conditionals and jointly optimizes the generative model and inference network. Empirical results show expressivity on par with state-of-the-art Gaussian VAEs and strong disentanglement without extra regularizers, highlighting the benefits of sparse gamma latents for interpretability. The approach demonstrates robust performance across text and image benchmarks, suggesting broad applicability to complex data while preserving interpretability advantages of gamma latent variables.

Abstract

The gamma belief network (GBN), often regarded as a deep topic model, has demonstrated its potential for uncovering multi-layer interpretable latent representations in text data. Its notable capability to acquire interpretable latent factors is partially attributed to sparse and non-negative gamma-distributed latent variables. However, the existing GBN and its variations are constrained by the linear generative model, thereby limiting their expressiveness and applicability. To address this limitation, we introduce the generalized gamma belief network (Generalized GBN) in this paper, which extends the original linear generative model to a more expressive non-linear generative model. Since the parameters of the Generalized GBN no longer possess an analytic conditional posterior, we further propose an upward-downward Weibull inference network to approximate the posterior distribution of the latent variables. The parameters of both the generative model and the inference network are jointly trained within the variational inference framework. Finally, we conduct comprehensive experiments on both expressivity and disentangled representation learning tasks to evaluate the performance of the Generalized GBN against state-of-the-art Gaussian variational autoencoders serving as baselines.
Paper Structure (16 sections, 11 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 11 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: With a trained autoencoder on the MNIST dataset, the probability histogram for the hidden representation is displayed in \ref{['motivation_1']} as the output from a linear layer and \ref{['motivation_2']} as the output from a ReLU activation layer. The former is suitable to be approximated by Gaussian distribution (\ref{['motivation_3']}), while the latter is suitable to be approximated by Gamma distribution (\ref{['motivation_4']}).
  • Figure 2: The graphical model of \ref{['fig_gbn']}: the generative model of Gamma Belief Network (GBN), and a sketch of the upward-downward Gibbs sampler, where $Z^{(l)}$ are augmented latent counts that are upward sampled in each Gibbs sampling iteration; \ref{['fig_hvae']}: the generative model and inference network of the hierarchical Gaussian VAE; \ref{['fig_ggbn']}: the inference and generation of the Generalized gamma belief network (Generalized GBN). Circles are stochastic variables, and squares are deterministic variables.
  • Figure 3: Reconstruction and unconditional generation samples of the Generalized GBN.
  • Figure 4: FID scores for unconditional generation on CIFAR-10
  • Figure 5: \ref{['fig_embedding_vis_1']}: Representations learned by the Generalized GBN. Each row represents a latent $\theta_i$. Column 1 (position) shows the mean activation (red represents high values) of each latent $\theta_i$ as a function of all 32x32 locations averaged across objects, rotations and scales. Columns 2 and 3 show the mean activation of each unit $\theta_i$ as a function of scale (respectively rotation), averaged across rotations and positions (respectively scales and positions). Square is red, oval is green and heart is blue. Columns 4-8 (second group) show reconstructions resulting from the traversal of each latent $\theta_i$ over 0 to 6 while keeping the remaining 9/10 latent units fixed to the values obtained by running inference on an image from the dataset. \ref{['fig_embedding_vis_2']}: Similar analysis for Gaussian VAE.