The EM Algorithm in Information Geometry
Sammy Suliman
TL;DR
By modeling distributions as statistical manifolds endowed with the Fisher information metric $g_{FIM}$, the paper develops an information-geometric view of the EM algorithm. It introduces $e$- and $m$-geodesics and $e$- and $m$-projections, linking the EM updates to KL divergence and the maximum-entropy principle via Theorems 5.3–5.5. A Pythagorean identity for dual-flat manifolds is derived, and several equivalence results between the EM and em algorithms are established, including conditions under which they coincide. The work also connects these geometric ideas to Bayesian conditioning and exponential-family representations, and discusses practical implications for deep learning through natural gradient methods, demonstrated with Python code in Appendices A and B. Overall, the approach provides a unifying geometric framework that informs probabilistic inference and optimization in machine learning.
Abstract
The purpose of this thesis is to convey the basic concepts of information geometry and its applications to non-specialists and those in applied fields, assuming only a first-year undergraduate background in calculus, linear algebra, and probability theory / statistics. We first begin with an introduction to the EM algorithm, providing a typical use case in Python, before moving to an overview of basic Riemannian geometry. We then introduce the core concepts of information geometry and the $em$ algorithm, with an explicit calculation of both the $e$ and $m$ projection, before closing with a discussion of an important application of this research to the field of deep learning, providing a novel implementation in Python.
