Optimising Distributions with Natural Gradient Surrogates
Jonathan So, Richard E. Turner
TL;DR
This work tackles the challenge of computing natural gradients for distribution-parameter optimization by reframing the problem in terms of a surrogate distribution $\tilde{q}$ with easy NGD computations. It formalises surrogate natural gradient descent (SNGD), proves equivalence under suitable conditions, and introduces exponential-family surrogates with auxiliary-parameter extensions to broaden applicability. The authors show that several existing NGD methods are instances of SNGD and demonstrate substantial speedups across MLE and VI tasks, including negative-binomial, skew-elliptical, elliptical-copula, and mixture models. The approach is easy to implement with standard autodiff, scalable, and general, offering a practical pathway to leverage natural gradients on a wider class of distributions and problems.
Abstract
Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution, for which computing the natural gradient is easy. We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems. Our method expands the set of distributions that can be efficiently targeted with natural gradients. Furthermore, it is fast, easy to understand, simple to implement using standard autodiff software, and does not require lengthy model-specific derivations. We demonstrate our method on maximum likelihood estimation and variational inference tasks.
