Table of Contents
Fetching ...

Sample as You Infer: Predictive Coding With Langevin Dynamics

Umais Zahid, Qinghai Guo, Zafeirios Fountas

TL;DR

A novel algorithm for parameter learning in generic deep generative models that builds upon the predictive coding (PC) framework of computational neuroscience is presented and a lightweight and easily computable form of preconditioning is validated.

Abstract

We present a novel algorithm for parameter learning in generic deep generative models that builds upon the predictive coding (PC) framework of computational neuroscience. Our approach modifies the standard PC algorithm to bring performance on-par and exceeding that obtained from standard variational auto-encoder (VAE) training. By injecting Gaussian noise into the PC inference procedure we re-envision it as an overdamped Langevin sampling, which facilitates optimisation with respect to a tight evidence lower bound (ELBO). We improve the resultant encoder-free training method by incorporating an encoder network to provide an amortised warm-start to our Langevin sampling and test three different objectives for doing so. Finally, to increase robustness to the sampling step size and reduce sensitivity to curvature, we validate a lightweight and easily computable form of preconditioning, inspired by Riemann Manifold Langevin and adaptive optimizers from the SGD literature. We compare against VAEs by training like-for-like generative models using our technique against those trained with standard reparameterisation-trick-based ELBOs. We observe our method out-performs or matches performance across a number of metrics, including sample quality, while converging in a fraction of the number of SGD training iterations.

Sample as You Infer: Predictive Coding With Langevin Dynamics

TL;DR

A novel algorithm for parameter learning in generic deep generative models that builds upon the predictive coding (PC) framework of computational neuroscience is presented and a lightweight and easily computable form of preconditioning is validated.

Abstract

We present a novel algorithm for parameter learning in generic deep generative models that builds upon the predictive coding (PC) framework of computational neuroscience. Our approach modifies the standard PC algorithm to bring performance on-par and exceeding that obtained from standard variational auto-encoder (VAE) training. By injecting Gaussian noise into the PC inference procedure we re-envision it as an overdamped Langevin sampling, which facilitates optimisation with respect to a tight evidence lower bound (ELBO). We improve the resultant encoder-free training method by incorporating an encoder network to provide an amortised warm-start to our Langevin sampling and test three different objectives for doing so. Finally, to increase robustness to the sampling step size and reduce sensitivity to curvature, we validate a lightweight and easily computable form of preconditioning, inspired by Riemann Manifold Langevin and adaptive optimizers from the SGD literature. We compare against VAEs by training like-for-like generative models using our technique against those trained with standard reparameterisation-trick-based ELBOs. We observe our method out-performs or matches performance across a number of metrics, including sample quality, while converging in a fraction of the number of SGD training iterations.
Paper Structure (21 sections, 10 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 10 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Projection of high-dimensional latent state trajectories under standard PC inference (right), and Langevin PC sampling (left), using normalised PCA trajectories. Latent state dynamics under Langevin PC result in a principled exploration of the posterior. More examples trajectories, and further details on how these were computed may be found in Appendix \ref{['appendix:pca_latent_trajectory']}. Contour lines and hue correspond to values of the negative log joint probability (blue high, red low), marker brightness corresponds to time-step (earlier is lighter).
  • Figure 2: FID when using amortised warm-starts trained with our three approximate inference objectives, and baseline with no warm-start model, using initialisation with the prior. $^*$ Values for the forward KL objective are reported for 1 epoch due to the instability of this objective resulting in exploding gradients.
  • Figure 3: Changes in log probability ($\Delta \log p({\bm{x}}, {\bm{z}})$) during Langevin sampling show forward KL initialisation results in long periods of drift-dominant conditions far from the mode.
  • Figure 4: FID for Langevin PC models with and without preconditioning across different step-sizes. Numbers in brackets correspond to the preconditioning decay rate ($\beta$). Models trained with preconditioned Langevin dynamics experience significantly less degradation in sample quality at higher step-sizes. With stronger preconditioning generally correlating to the greatest robustness against inference learning rate.
  • Figure 5: (A) Samples from identical generative models trained as VAEs (left), with LPC (middle), and with preconditioned LPC (right) on CelebA 64x64 (top), and SVHN (bottom). Epoch 50 samples for VAE models can be found in Appendix \ref{['appendix:epoch_50_samples']}. (B) Sample FID curves of VAE and LPC models throughout training. LPC models generally converge in significantly fewer epochs than their equivalent VAE trained models, with certain models converging in as few as 3 epochs. $^\dag$ Note: FID values reported in this graph are calculated online during training using significantly fewer samples than the post-training values reported in Table \ref{['tab:final_results']}, and may thus differ in precise value.
  • ...and 4 more figures