The EM Algorithm in Information Geometry

Sammy Suliman

The EM Algorithm in Information Geometry

Sammy Suliman

TL;DR

By modeling distributions as statistical manifolds endowed with the Fisher information metric $g_{FIM}$, the paper develops an information-geometric view of the EM algorithm. It introduces $e$- and $m$-geodesics and $e$- and $m$-projections, linking the EM updates to KL divergence and the maximum-entropy principle via Theorems 5.3–5.5. A Pythagorean identity for dual-flat manifolds is derived, and several equivalence results between the EM and em algorithms are established, including conditions under which they coincide. The work also connects these geometric ideas to Bayesian conditioning and exponential-family representations, and discusses practical implications for deep learning through natural gradient methods, demonstrated with Python code in Appendices A and B. Overall, the approach provides a unifying geometric framework that informs probabilistic inference and optimization in machine learning.

Abstract

The purpose of this thesis is to convey the basic concepts of information geometry and its applications to non-specialists and those in applied fields, assuming only a first-year undergraduate background in calculus, linear algebra, and probability theory / statistics. We first begin with an introduction to the EM algorithm, providing a typical use case in Python, before moving to an overview of basic Riemannian geometry. We then introduce the core concepts of information geometry and the $em$ algorithm, with an explicit calculation of both the $e$ and $m$ projection, before closing with a discussion of an important application of this research to the field of deep learning, providing a novel implementation in Python.

The EM Algorithm in Information Geometry

TL;DR

By modeling distributions as statistical manifolds endowed with the Fisher information metric

, the paper develops an information-geometric view of the EM algorithm. It introduces

- and

-geodesics and

- and

-projections, linking the EM updates to KL divergence and the maximum-entropy principle via Theorems 5.3–5.5. A Pythagorean identity for dual-flat manifolds is derived, and several equivalence results between the EM and em algorithms are established, including conditions under which they coincide. The work also connects these geometric ideas to Bayesian conditioning and exponential-family representations, and discusses practical implications for deep learning through natural gradient methods, demonstrated with Python code in Appendices A and B. Overall, the approach provides a unifying geometric framework that informs probabilistic inference and optimization in machine learning.

Abstract

algorithm, with an explicit calculation of both the

and

projection, before closing with a discussion of an important application of this research to the field of deep learning, providing a novel implementation in Python.

Paper Structure (10 sections, 196 equations, 10 figures)

This paper contains 10 sections, 196 equations, 10 figures.

The EM algorithm
Introduction to Riemannian Geometry
The Fisher Information Metric
The Pythagorean Theorem
The em Algorithm
Further Applications
Acknowledgements
References
Appendix A: The EM Algorithm code example
Appendix B: The natural gradient code example

Figures (10)

Figure :
Figure :
Figure :
Figure :
Figure :
...and 5 more figures

The EM Algorithm in Information Geometry

TL;DR

Abstract

The EM Algorithm in Information Geometry

Authors

TL;DR

Abstract

Table of Contents

Figures (10)