Table of Contents
Fetching ...

Learning Sparse Codes with Entropy-Based ELBOs

Dmytro Velychko, Simon Damm, Asja Fischer, Jörg Lücke

TL;DR

This work addresses sparse coding with a Laplace prior on latent codes under a Gaussian observation model, and derives an entropy-based, fully analytic ELBO that enables learning without resorting to MAP approximations.By showing ELBO convergence to sums of entropies on a submanifold where the prior scales and noise variance are stationary, the authors obtain an analytic learning objective $\mathcal{L}^{\mathcal{H}}(\Phi,\tilde{W})$ with closed-form optimal parameters for $\boldsymbol{\lambda}$ and $\sigma^2$.The paper further introduces entropy annealing schemes that promote sparser and more localized encodings, demonstrates these methods on artificial bars and natural image patches, and discusses connections to $\ell_1$ sparsity and amortized inference.Overall, the results provide a principled, closed-form objective for probabilistic sparse coding with nontrivial posterior approximations and pave the way for applying entropy-based learning to deeper sparse generative models.

Abstract

Standard probabilistic sparse coding assumes a Laplace prior, a linear mapping from latents to observables, and Gaussian observable distributions. We here derive a solely entropy-based learning objective for the parameters of standard sparse coding. The novel variational objective has the following features: (A) unlike MAP approximations, it uses non-trivial posterior approximations for probabilistic inference; (B) unlike for previous non-trivial approximations, the novel objective is fully analytical; and (C) the objective allows for a novel principled form of annealing. The objective is derived by first showing that the standard ELBO objective converges to a sum of entropies, which matches similar recent results for generative models with Gaussian priors. The conditions under which the ELBO becomes equal to entropies are then shown to have analytical solutions, which leads to the fully analytical objective. Numerical experiments are used to demonstrate the feasibility of learning with such entropy-based ELBOs. We investigate different posterior approximations including Gaussians with correlated latents and deep amortized approximations. Furthermore, we numerically investigate entropy-based annealing which results in improved learning. Our main contributions are theoretical, however, and they are twofold: (1) for non-trivial posterior approximations, we provide the (to the knowledge of the authors) first analytical ELBO objective for standard probabilistic sparse coding; and (2) we provide the first demonstration on how a recently shown convergence of the ELBO to entropy sums can be used for learning.

Learning Sparse Codes with Entropy-Based ELBOs

TL;DR

This work addresses sparse coding with a Laplace prior on latent codes under a Gaussian observation model, and derives an entropy-based, fully analytic ELBO that enables learning without resorting to MAP approximations.By showing ELBO convergence to sums of entropies on a submanifold where the prior scales and noise variance are stationary, the authors obtain an analytic learning objective $\mathcal{L}^{\mathcal{H}}(\Phi,\tilde{W})$ with closed-form optimal parameters for $\boldsymbol{\lambda}$ and $\sigma^2$.The paper further introduces entropy annealing schemes that promote sparser and more localized encodings, demonstrates these methods on artificial bars and natural image patches, and discusses connections to $\ell_1$ sparsity and amortized inference.Overall, the results provide a principled, closed-form objective for probabilistic sparse coding with nontrivial posterior approximations and pave the way for applying entropy-based learning to deeper sparse generative models.

Abstract

Standard probabilistic sparse coding assumes a Laplace prior, a linear mapping from latents to observables, and Gaussian observable distributions. We here derive a solely entropy-based learning objective for the parameters of standard sparse coding. The novel variational objective has the following features: (A) unlike MAP approximations, it uses non-trivial posterior approximations for probabilistic inference; (B) unlike for previous non-trivial approximations, the novel objective is fully analytical; and (C) the objective allows for a novel principled form of annealing. The objective is derived by first showing that the standard ELBO objective converges to a sum of entropies, which matches similar recent results for generative models with Gaussian priors. The conditions under which the ELBO becomes equal to entropies are then shown to have analytical solutions, which leads to the fully analytical objective. Numerical experiments are used to demonstrate the feasibility of learning with such entropy-based ELBOs. We investigate different posterior approximations including Gaussians with correlated latents and deep amortized approximations. Furthermore, we numerically investigate entropy-based annealing which results in improved learning. Our main contributions are theoretical, however, and they are twofold: (1) for non-trivial posterior approximations, we provide the (to the knowledge of the authors) first analytical ELBO objective for standard probabilistic sparse coding; and (2) we provide the first demonstration on how a recently shown convergence of the ELBO to entropy sums can be used for learning.
Paper Structure (29 sections, 6 theorems, 104 equations, 35 figures, 1 table)

This paper contains 29 sections, 6 theorems, 104 equations, 35 figures, 1 table.

Key Result

Theorem 1

Consider the ELBO in EqnELBO for the sparse coding model in EqnPSC2 with parameters $\Theta=(\boldsymbol{\lambda},\tilde{W},\sigma^2)$. If the parameters $\boldsymbol{\lambda}$ and $\sigma^2$ are at a stationary point, i.e., then it applies for any variational distributions $q_{\Phi}(\mathbf{z})$ and for any matrix $\tilde{W}$ (with unit column lengths) that:

Figures (35)

  • Figure 1: Latent variable model.Left: graphical model representation corresponding to many popular latent variable models, including VAEs. Right: graphical model with learnable prior parameters and constrained likelihood parameters as used in this work.
  • Figure 2: Training data samples
  • Figure 3: Learned generative fields
  • Figure 5: Optimization of entropy-ELBOs. Two non-amortized optimizations and two amortized optimizations are shown. Two optimizations use annealing.
  • Figure 6: No annealing, epoch 10
  • ...and 30 more figures

Theorems & Definitions (12)

  • Theorem 1: ELBO converges to a sum of entropies
  • proof
  • Theorem 2: Optimal scales and variance
  • proof : Proof sketch
  • Theorem 3
  • proof : Proof sketch
  • Theorem 4
  • proof
  • Lemma 1: Equality of gradients on manifold of optimal scales and variance
  • proof
  • ...and 2 more