Equivalence of Informations Characterizes Bregman Divergences

Philip S. Chodrow

Equivalence of Informations Characterizes Bregman Divergences

Philip S. Chodrow

TL;DR

The paper proves that Bregman divergences are the unique divergences that preserve an equality between two information measures, the Jensen gap information and the divergence information, for all weighted datasets. It defines $I_\phi(\mu,X)$ and $I_d(\mu,X)$ and introduces the information-equivalence condition $I_\phi(\mu,X)=I_d(\mu,X)$. The main theorem shows that if this condition holds for all $(\mu,X)$, then $d$ must equal the Bregman divergence $d_\phi$, i.e., $d(x,y)=\phi(x)-\phi(y)-\nabla\phi(y)^T(x-y)$. This result provides a new, rigorous characterization of Bregman divergences and ties them to familiar instances such as KL divergence on the simplex and the Euclidean/Mahalanobis distances, with implications for clustering, quantization, and information-theoretic interpretations of loss.

Abstract

Bregman divergences are a class of distance-like comparison functions which play fundamental roles in optimization, statistics, and information theory. One important property of Bregman divergences is that they cause two useful formulations of information content (in the sense of variability or non-uniformity) in a weighted collection of vectors to agree. In this note, we show that this agreement in fact characterizes the class of Bregman divergences; they are the only divergences which generate this agreement for arbitrary collections of weighted vectors.

Equivalence of Informations Characterizes Bregman Divergences

TL;DR

and

and introduces the information-equivalence condition

. The main theorem shows that if this condition holds for all

, then

must equal the Bregman divergence

, i.e.,

. This result provides a new, rigorous characterization of Bregman divergences and ties them to familiar instances such as KL divergence on the simplex and the Euclidean/Mahalanobis distances, with implications for clustering, quantization, and information-theoretic interpretations of loss.

Abstract

Paper Structure (4 sections, 6 theorems, 25 equations)

This paper contains 4 sections, 6 theorems, 25 equations.

Introduction
Bregman Divergence and Two Informations
Main Result
Discussion

Key Result

Lemma 1

If $d = d_\phi$, then the pair $(\phi, d)$ satisfies the information equivalence property.

Theorems & Definitions (15)

Definition 1: Jensen Gap Information
Definition 2: Divergence
Definition 3: Divergence Information
Definition 4: Information Equivalence
Lemma 1: Information Equivalence with Bregman DivergencesbanerjeeOptimalBregmanPrediction2004banerjeeClusteringBregmanDivergences2004
Theorem 1
Lemma 2
proof
Lemma 3
proof
...and 5 more

Equivalence of Informations Characterizes Bregman Divergences

TL;DR

Abstract

Equivalence of Informations Characterizes Bregman Divergences

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (15)