Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

Arthur Charpentier; Agathe Fernandes-Machado

Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

Arthur Charpentier, Agathe Fernandes-Machado

Abstract

Calibration is a conditional property that depends on the information retained by a predictor. We develop decomposition identities for arbitrary proper losses that make this dependence explicit. At any information level $\mathcal A$, the expected loss of an $\mathcal A$-measurable predictor splits into a proper-regret (reliability) term and a conditional entropy (residual uncertainty) term. For nested levels $\mathcal A\subseteq\mathcal B$, a chain decomposition quantifies the information gain from $\mathcal A$ to $\mathcal B$. Applied to classification with features $\boldsymbol{X}$ and score $S=s(\boldsymbol{X})$, this yields a three-term identity: miscalibration, a {\em grouping} term measuring information loss from $\boldsymbol{X}$ to $S$, and irreducible uncertainty at the feature level. We leverage the framework to analyze post-hoc recalibration, aggregation of calibrated models, and stagewise/boosting constructions, with explicit forms for Brier and log-loss.

Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

Abstract

, the expected loss of an

-measurable predictor splits into a proper-regret (reliability) term and a conditional entropy (residual uncertainty) term. For nested levels

, a chain decomposition quantifies the information gain from

. Applied to classification with features

and score

, this yields a three-term identity: miscalibration, a {\em grouping} term measuring information loss from

, and irreducible uncertainty at the feature level. We leverage the framework to analyze post-hoc recalibration, aggregation of calibrated models, and stagewise/boosting constructions, with explicit forms for Brier and log-loss.

Paper Structure (75 sections, 16 theorems, 68 equations, 6 figures, 3 tables)

This paper contains 75 sections, 16 theorems, 68 equations, 6 figures, 3 tables.

Introduction
Setup and notation
Related work
Contributions and organization
Contributions.
Organization.
Decompositions
General Decompositions
Back to Calibration
Fundamental uncertainty
Well-calibration
Global balance (marginal calibration).
Calibrated scores and a proper-loss notion of calibration error
Diagnostics and visualization
Recalibration
...and 60 more sections

Key Result

Theorem 2.1

Let $\mathcal{A}\subseteq\mathcal{F}$ and let $T$ be a $\mathcal{A}$-measurable random variable taking values in $\Delta(\mathcal{Y})$. Then

Figures (6)

Figure 1: Left: learned monotone calibration maps $\widehat{g}(\cdot)$ for the two base models ($\widehat{s}_1$ in green, $\widehat{s}_2$ in blue) and for their average $\widehat{s}_{\mathrm{ens}}$ (red), shown here for $\rho=0$. The grey diagonal corresponds to perfect calibration. Right: local calibration score as a function of $\rho$ for the two base models (green/blue) and their average (red), computed on an independent test split.
Figure 2: Reducible Brier components on test data. Bars show reliability $\widehat{\mathbb{E}}[(S-\widehat{C})^2]$ and grouping $\widehat{\mathbb{E}}[(\widehat{C}-Q)^2]$ before and after monotone recalibration, for three scores: a logistic model on $X_1$ (left), a logistic model on $(X_1,X_2)$ (middle), and a quantized score (right). Recalibration eliminates reliability but leaves grouping essentially unchanged.
Figure 3: GermanCredit: monotone-smoothed calibration curves on the test split. Green: raw score; red: after monotone post-hoc recalibration fit on a separate calibration split. The diagonal indicates perfect calibration.
Figure 4: GermanCredit: summary metrics on the test split. Left: miscalibration proxy $\mathrm{LCS}=\widehat{\mathbb{E}}(S-\widehat{C}(S))^2$. Middle/right: proper losses (Brier and log-loss) before vs. after recalibration.
Figure 5: True conditional probability surface $Q(x_1,x_2)$ for the synthetic model, on the left hand side. Then 3 scatterplots of simulated data, with $\rho=-0.7$ (middle left), $\rho=0$ (middle right), and $\rho=0.7$ (right).
...and 1 more figures

Theorems & Definitions (34)

Theorem 2.1
Theorem 2.2: Chain decomposition for nested information levels
Lemma 2.3: When does the grouping term vanish?
Example 2.4: Perfect calibration does not prevent large information loss
Proposition 2.5: Uncertainty--resolution--reliability form
Theorem 2.6
Corollary 2.7: Variance-type decomposition (Brier)
Corollary 2.8: Log-loss: entropies and information
Definition 3.1: Perfect calibration
Proposition 3.2: Calibration error as proper regret
...and 24 more

Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

Abstract

Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (34)