On the Natural Gradient of the Evidence Lower Bound
Nihat Ay, Jesse van Oostrum, Adwait Datar
TL;DR
This work analyzes how the Fisher-Rao (natural) gradient behaves for the evidence lower bound (ELBO) in variational inference. By adopting an information-geometric perspective, it relates the ELBO optimization on an extended space with hidden units to learning the target distribution on the visible space, and identifies a cylindrical-model condition under which the natural gradient of ELBO coincides with the natural gradient of the evidence. The core results show that, for cylindrical models, the variational gap has no effect on learning and the ELBO gradient maps to the evidence gradient; for non-cylindrical models, this invariance can fail, motivating geometric criteria for preserved equivalence. The findings offer theoretical justification for using natural-gradient-based ELBO optimization in full or cylindrical settings and clarify gradient behavior in Bayesian graphical models via tangent-space decompositions.
Abstract
This article studies the Fisher-Rao gradient, also referred to as the natural gradient, of the evidence lower bound (ELBO) which plays a central role in generative machine learning. It reveals that the gap between the evidence and its lower bound, the ELBO, has essentially a vanishing natural gradient within unconstrained optimization. As a result, maximization of the ELBO is equivalent to minimization of the Kullback-Leibler divergence from a target distribution, the primary objective function of learning. Building on this insight, we derive a condition under which this equivalence persists even when optimization is constrained to a model. This condition yields a geometric characterization, which we formalize through the notion of a cylindrical model.
