Table of Contents
Fetching ...

A Generalized Bias-Variance Decomposition for Bregman Divergences

David Pfau

TL;DR

This paper generalizes the classical bias-variance decomposition from squared error to a Bregman-divergence-based loss $D_F$, enabling a decomposition for exponential-family log-likelihood losses such as the cross-entropy. It provides a self-contained derivation of the generalized decomposition: with $f^*(X)=\mathbb{E}[Y]$ and $\bar{f}(X)=\arg\min_z \mathbb{E}_D[D_F[z||f_D(X)]]$, the expected loss satisfies $\mathbb{E}_{D,Y}[D_F[Y||f_D(X)]] = \mathbb{E}_Y[D_F[Y||f^*(X)]] + D_F[f^*(X)||\bar{f}(X)] + \mathbb{E}_D[D_F[\bar{f}(X)||f_D(X)]]$. It also derives the optimality conditions $x^*$ minimizes $\mathbb{E}[D_F[z||X]]$ via $\nabla F(x^*) = \mathbb{E}[\nabla F(X)]$ and shows $z = \mathbb{E}[X]$ minimizes $\mathbb{E}[D_F[X||z]]$. Finally, it connects to exponential-family theory by showing the log-likelihood can be written as a Bregman divergence relative to the convex conjugate $A^*$, clarifying links to maximum-likelihood estimation and proper scoring rules.

Abstract

The bias-variance decomposition is a central result in statistics and machine learning, but is typically presented only for the squared error. We present a generalization of the bias-variance decomposition where the prediction error is a Bregman divergence, which is relevant to maximum likelihood estimation with exponential families. While the result is already known, there was not previously a clear, standalone derivation, so we provide one for pedagogical purposes. A version of this note previously appeared on the author's personal website without context. Here we provide additional discussion and references to the relevant prior literature.

A Generalized Bias-Variance Decomposition for Bregman Divergences

TL;DR

This paper generalizes the classical bias-variance decomposition from squared error to a Bregman-divergence-based loss , enabling a decomposition for exponential-family log-likelihood losses such as the cross-entropy. It provides a self-contained derivation of the generalized decomposition: with and , the expected loss satisfies . It also derives the optimality conditions minimizes via and shows minimizes . Finally, it connects to exponential-family theory by showing the log-likelihood can be written as a Bregman divergence relative to the convex conjugate , clarifying links to maximum-likelihood estimation and proper scoring rules.

Abstract

The bias-variance decomposition is a central result in statistics and machine learning, but is typically presented only for the squared error. We present a generalization of the bias-variance decomposition where the prediction error is a Bregman divergence, which is relevant to maximum likelihood estimation with exponential families. While the result is already known, there was not previously a clear, standalone derivation, so we provide one for pedagogical purposes. A version of this note previously appeared on the author's personal website without context. Here we provide additional discussion and references to the relevant prior literature.

Paper Structure

This paper contains 3 sections, 4 theorems, 14 equations.

Key Result

Lemma 2.2

Let $F : \mathcal{S} \to \mathbb{R}$ be a strictly convex differentiable function, and $X$ be a random variable on $\mathcal{S}$. Then $x^* = \arg \min_z \mathbb{E}\left[D_F[z||X]\right] \Leftrightarrow \nabla F(x^*) = \mathbb{E}\left[\nabla F(X)\right]$ and $\mathbb{E}\left[X\right] = \arg \min_z \

Theorems & Definitions (10)

  • Definition 2.1: Bregman Divergence
  • Lemma 2.2: Minimum Expected Bregman Divergence
  • proof
  • Theorem 2.3: Decomposition of Expected Bregman Divergence
  • proof
  • Theorem 2.4: Generalized Bias-Variance Decomposition
  • proof
  • Definition A.1: Exponential Family
  • Lemma A.2: Exponential Family as a Bregman Divergence
  • proof