Zero-Variance Gradients for Variational Autoencoders

Zilei Shao; Anji Liu; Guy Van den Broeck

Zero-Variance Gradients for Variational Autoencoders

Zilei Shao, Anji Liu, Guy Van den Broeck

TL;DR

This paper introduces a training paradigm that uses the analytic gradient to guide early encoder learning before annealing to a standard stochastic estimator, and suggests that architectural choices enabling analytic expectation computation can significantly stabilize the training of generative models with stochastic components.

Abstract

Training deep generative models like Variational Autoencoders (VAEs) requires propagating gradients through stochastic latent variables, which introduces estimation variance that can slow convergence and degrade performance. In this paper, we explore an orthogonal direction, which we call Silent Gradients. Instead of designing improved stochastic estimators, we show that by restricting the decoder architecture in specific ways, the expected ELBO can be computed analytically. This yields gradients with zero estimation variance as we can directly compute the evidence lower-bound without resorting to Monte Carlo samples of the latent variables. We first provide a theoretical analysis in a controlled setting with a linear decoder and demonstrate improved optimization compared to standard estimators. To extend this idea to expressive nonlinear decoders, we introduce a training paradigm that uses the analytic gradient to guide early encoder learning before annealing to a standard stochastic estimator. Across multiple datasets, our approach consistently improves established baselines, including reparameterization, Gumbel-Softmax, and REINFORCE. These results suggest that architectural choices enabling analytic expectation computation can significantly stabilize the training of generative models with stochastic components.

Zero-Variance Gradients for Variational Autoencoders

TL;DR

Abstract

Zero-Variance Gradients for Variational Autoencoders

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)