Table of Contents
Fetching ...

Identifiable Deep Generative Models via Sparse Decoding

Gemma E. Moran, Dhanya Sridhar, Yixin Wang, David M. Blei

TL;DR

The paper tackles the challenge of learning identifiable, interpretable latent representations from high-dimensional tabular data. It introduces a sparse deep generative model (DGM) and a Sparse Variational Autoencoder (Sparse VAE) that impose sparsity in the factor-to-feature decoder via a Spike-and-Slab Lasso prior, enabling each feature to depend on a subset of latent factors. Identifiability is established under an anchor-feature assumption, with two theorems covering known and unknown anchor scenarios, and this theory is complemented by extensive experiments across synthetic, semi-synthetic, text, rating, and genomics data showing improved held-out predictive performance and interpretable factors. The work advances transferable, disentangled representations in DGMs and provides a principled framework for sparse, identifiable deep generative modeling of tabular data.

Abstract

We develop the sparse VAE for unsupervised representation learning on high-dimensional data. The sparse VAE learns a set of latent factors (representations) which summarize the associations in the observed data features. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We prove such sparse deep generative models are identifiable: with infinite data, the true model parameters can be learned. (In contrast, most deep generative models are not identifiable.) We empirically study the sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.

Identifiable Deep Generative Models via Sparse Decoding

TL;DR

The paper tackles the challenge of learning identifiable, interpretable latent representations from high-dimensional tabular data. It introduces a sparse deep generative model (DGM) and a Sparse Variational Autoencoder (Sparse VAE) that impose sparsity in the factor-to-feature decoder via a Spike-and-Slab Lasso prior, enabling each feature to depend on a subset of latent factors. Identifiability is established under an anchor-feature assumption, with two theorems covering known and unknown anchor scenarios, and this theory is complemented by extensive experiments across synthetic, semi-synthetic, text, rating, and genomics data showing improved held-out predictive performance and interpretable factors. The work advances transferable, disentangled representations in DGMs and provides a principled framework for sparse, identifiable deep generative modeling of tabular data.

Abstract

We develop the sparse VAE for unsupervised representation learning on high-dimensional data. The sparse VAE learns a set of latent factors (representations) which summarize the associations in the observed data features. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We prove such sparse deep generative models are identifiable: with infinite data, the true model parameters can be learned. (In contrast, most deep generative models are not identifiable.) We empirically study the sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.

Paper Structure

This paper contains 21 sections, 2 theorems, 41 equations, 4 figures, 11 tables, 1 algorithm.

Key Result

Theorem 1

Suppose we have infinite data drawn from the model in eq:sparse_dgm and $A1$ holds. Assume we are given the rows of $\bm{W}$ corresponding to the anchor features. Suppose we have two solutions with equal likelihood: $\{\widetilde{\theta}, \widetilde{\bm{z}}\}$ and $\{\widehat{\theta}, \widehat{\bm{z Then, the factors $\widetilde{\bm{z}}$ and $\widehat{\bm{z}}$ are equal up to coordinate wise trans

Figures (4)

  • Figure 1: In a DGM, a feature $x_{ij}$ depends on all factors, $z_{ik}$. A sparse DGM is displayed where features $x_{i1}, x_{i2}$ depend only on $z_{i1}$; $x_{i3}$ depends on $z_{i1}$ and $z_{i2}$; and $x_{i4}, x_{i5}$ depend only on $z_{i2}$. The features are passed through the same neural network $f_{\theta}$.
  • Figure 2: (a-b) The sparse VAE estimates factors which recover the true generative process; the VAE does not. The observed data is plotted against the estimated factors. The true factor-feature relationship is the red line; the best fit coefficients for the estimated factors are in the grey boxes. (c) The true $\bm{W}$ matrix. (d) The sparse VAE estimate of $\bm{W}$. (VAE has no $\bm{W}$ matrix).
  • Figure 3: Synthetic data. (a) Sparse VAE has better heldout predictive performance than the VAE over a range of factor correlation levels. (b) Sparse VAE recovers the true factors better than the VAE. ($\beta$-VAE performed similarly to VAE). Scores are shown for 25 datasets per correlation setting.
  • Figure 4: Synthetic data. The Sparse VAE with less regularization on $\bm{W}$ (SparseVAE-Slab) performs slightly worse than Sparse VAE with more regularization on $\bm{W}$ (SparseVAE) in terms of (a) MSE; (b) DCI Disentanglement Score; and (c) F-score of the estimated support of $\bm{W}$.

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2