Equivalence of Informations Characterizes Bregman Divergences
Philip S. Chodrow
TL;DR
The paper proves that Bregman divergences are the unique divergences that preserve an equality between two information measures, the Jensen gap information and the divergence information, for all weighted datasets. It defines $I_\phi(\mu,X)$ and $I_d(\mu,X)$ and introduces the information-equivalence condition $I_\phi(\mu,X)=I_d(\mu,X)$. The main theorem shows that if this condition holds for all $(\mu,X)$, then $d$ must equal the Bregman divergence $d_\phi$, i.e., $d(x,y)=\phi(x)-\phi(y)-\nabla\phi(y)^T(x-y)$. This result provides a new, rigorous characterization of Bregman divergences and ties them to familiar instances such as KL divergence on the simplex and the Euclidean/Mahalanobis distances, with implications for clustering, quantization, and information-theoretic interpretations of loss.
Abstract
Bregman divergences are a class of distance-like comparison functions which play fundamental roles in optimization, statistics, and information theory. One important property of Bregman divergences is that they cause two useful formulations of information content (in the sense of variability or non-uniformity) in a weighted collection of vectors to agree. In this note, we show that this agreement in fact characterizes the class of Bregman divergences; they are the only divergences which generate this agreement for arbitrary collections of weighted vectors.
