A Generalized Bias-Variance Decomposition for Bregman Divergences
David Pfau
TL;DR
This paper generalizes the classical bias-variance decomposition from squared error to a Bregman-divergence-based loss $D_F$, enabling a decomposition for exponential-family log-likelihood losses such as the cross-entropy. It provides a self-contained derivation of the generalized decomposition: with $f^*(X)=\mathbb{E}[Y]$ and $\bar{f}(X)=\arg\min_z \mathbb{E}_D[D_F[z||f_D(X)]]$, the expected loss satisfies $\mathbb{E}_{D,Y}[D_F[Y||f_D(X)]] = \mathbb{E}_Y[D_F[Y||f^*(X)]] + D_F[f^*(X)||\bar{f}(X)] + \mathbb{E}_D[D_F[\bar{f}(X)||f_D(X)]]$. It also derives the optimality conditions $x^*$ minimizes $\mathbb{E}[D_F[z||X]]$ via $\nabla F(x^*) = \mathbb{E}[\nabla F(X)]$ and shows $z = \mathbb{E}[X]$ minimizes $\mathbb{E}[D_F[X||z]]$. Finally, it connects to exponential-family theory by showing the log-likelihood can be written as a Bregman divergence relative to the convex conjugate $A^*$, clarifying links to maximum-likelihood estimation and proper scoring rules.
Abstract
The bias-variance decomposition is a central result in statistics and machine learning, but is typically presented only for the squared error. We present a generalization of the bias-variance decomposition where the prediction error is a Bregman divergence, which is relevant to maximum likelihood estimation with exponential families. While the result is already known, there was not previously a clear, standalone derivation, so we provide one for pedagogical purposes. A version of this note previously appeared on the author's personal website without context. Here we provide additional discussion and references to the relevant prior literature.
