Table of Contents
Fetching ...

Equivalence of Informations Characterizes Bregman Divergences

Philip S. Chodrow

TL;DR

The paper proves that Bregman divergences are the unique divergences that preserve an equality between two information measures, the Jensen gap information and the divergence information, for all weighted datasets. It defines $I_\phi(\mu,X)$ and $I_d(\mu,X)$ and introduces the information-equivalence condition $I_\phi(\mu,X)=I_d(\mu,X)$. The main theorem shows that if this condition holds for all $(\mu,X)$, then $d$ must equal the Bregman divergence $d_\phi$, i.e., $d(x,y)=\phi(x)-\phi(y)-\nabla\phi(y)^T(x-y)$. This result provides a new, rigorous characterization of Bregman divergences and ties them to familiar instances such as KL divergence on the simplex and the Euclidean/Mahalanobis distances, with implications for clustering, quantization, and information-theoretic interpretations of loss.

Abstract

Bregman divergences are a class of distance-like comparison functions which play fundamental roles in optimization, statistics, and information theory. One important property of Bregman divergences is that they cause two useful formulations of information content (in the sense of variability or non-uniformity) in a weighted collection of vectors to agree. In this note, we show that this agreement in fact characterizes the class of Bregman divergences; they are the only divergences which generate this agreement for arbitrary collections of weighted vectors.

Equivalence of Informations Characterizes Bregman Divergences

TL;DR

The paper proves that Bregman divergences are the unique divergences that preserve an equality between two information measures, the Jensen gap information and the divergence information, for all weighted datasets. It defines and and introduces the information-equivalence condition . The main theorem shows that if this condition holds for all , then must equal the Bregman divergence , i.e., . This result provides a new, rigorous characterization of Bregman divergences and ties them to familiar instances such as KL divergence on the simplex and the Euclidean/Mahalanobis distances, with implications for clustering, quantization, and information-theoretic interpretations of loss.

Abstract

Bregman divergences are a class of distance-like comparison functions which play fundamental roles in optimization, statistics, and information theory. One important property of Bregman divergences is that they cause two useful formulations of information content (in the sense of variability or non-uniformity) in a weighted collection of vectors to agree. In this note, we show that this agreement in fact characterizes the class of Bregman divergences; they are the only divergences which generate this agreement for arbitrary collections of weighted vectors.
Paper Structure (4 sections, 6 theorems, 25 equations)

This paper contains 4 sections, 6 theorems, 25 equations.

Key Result

Lemma 1

If $d = d_\phi$, then the pair $(\phi, d)$ satisfies the information equivalence property.

Theorems & Definitions (15)

  • Definition 1: Jensen Gap Information
  • Definition 2: Divergence
  • Definition 3: Divergence Information
  • Definition 4: Information Equivalence
  • Lemma 1: Information Equivalence with Bregman DivergencesbanerjeeOptimalBregmanPrediction2004banerjeeClusteringBregmanDivergences2004
  • Theorem 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • ...and 5 more