Affine calculus for constrained minima of the Kullback-Leibler divergence
Giovanni Pistone
TL;DR
The paper develops a non-parametric, dually affine Information Geometry framework on the open probability simplex, formalized via a statistical bundle SđŒ(Ω) and dual exponential/mixture transports, to study constrained KL minimization. It derives explicit total natural gradients for key divergences, including $D(p||q)$, the cross entropy, entropy, and JensenâShannon divergence, and shows how Fisherâs score becomes a moving-chart velocity within this geometry. The analysis is then specialized to product spaces, yielding principled treatments of marginalization, mean-field approximations, Kantorovich and Schrödinger transport, and variational Bayes; concrete gradient-flow forms enable systematic optimization in these settings. The framework unifies Fisherian statistics with transport and statistical physics concepts, offering a principled toolkit for gradient-based learning on non-parametric probability simplices and suggesting avenues for algorithmic development and continuous-space extensions. Key formulas include the total natural gradient grad(D)(q,r) = (âs_q(r), âη_r(q)) and the JS gradient grad(JS(q,r)) = â1/2 s_q((q+r)/2).
Abstract
The non-parametric version of Amari's dually affine Information Geometry provides a practical calculus to perform computations of interest in statistical machine learning. The method uses the notion of a statistical bundle, a mathematical structure that includes both probability densities and random variables to capture the spirit of Fisherian statistics. We focus on computations involving a constrained minimization of the Kullback-Leibler divergence. We show how to obtain neat and principled versions of known computation in applications such as mean-field approximation, adversarial generative models, and variational Bayes.
