Affine calculus for constrained minima of the Kullback-Leibler divergence

Giovanni Pistone

Affine calculus for constrained minima of the Kullback-Leibler divergence

Giovanni Pistone

TL;DR

The paper develops a non-parametric, dually affine Information Geometry framework on the open probability simplex, formalized via a statistical bundle S𝔼(Ω) and dual exponential/mixture transports, to study constrained KL minimization. It derives explicit total natural gradients for key divergences, including $D(p||q)$, the cross entropy, entropy, and Jensen–Shannon divergence, and shows how Fisher’s score becomes a moving-chart velocity within this geometry. The analysis is then specialized to product spaces, yielding principled treatments of marginalization, mean-field approximations, Kantorovich and Schrödinger transport, and variational Bayes; concrete gradient-flow forms enable systematic optimization in these settings. The framework unifies Fisherian statistics with transport and statistical physics concepts, offering a principled toolkit for gradient-based learning on non-parametric probability simplices and suggesting avenues for algorithmic development and continuous-space extensions. Key formulas include the total natural gradient grad(D)(q,r) = (−s_q(r), −η_r(q)) and the JS gradient grad(JS(q,r)) = −1/2 s_q((q+r)/2).

Abstract

The non-parametric version of Amari's dually affine Information Geometry provides a practical calculus to perform computations of interest in statistical machine learning. The method uses the notion of a statistical bundle, a mathematical structure that includes both probability densities and random variables to capture the spirit of Fisherian statistics. We focus on computations involving a constrained minimization of the Kullback-Leibler divergence. We show how to obtain neat and principled versions of known computation in applications such as mean-field approximation, adversarial generative models, and variational Bayes.

Affine calculus for constrained minima of the Kullback-Leibler divergence

TL;DR

, the cross entropy, entropy, and Jensen–Shannon divergence, and shows how Fisher’s score becomes a moving-chart velocity within this geometry. The analysis is then specialized to product spaces, yielding principled treatments of marginalization, mean-field approximations, Kantorovich and Schrödinger transport, and variational Bayes; concrete gradient-flow forms enable systematic optimization in these settings. The framework unifies Fisherian statistics with transport and statistical physics concepts, offering a principled toolkit for gradient-based learning on non-parametric probability simplices and suggesting avenues for algorithmic development and continuous-space extensions. Key formulas include the total natural gradient grad(D)(q,r) = (−s_q(r), −η_r(q)) and the JS gradient grad(JS(q,r)) = −1/2 s_q((q+r)/2).

Affine calculus for constrained minima of the Kullback-Leibler divergence

TL;DR

Abstract

Affine calculus for constrained minima of the Kullback-Leibler divergence

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (11)