Table of Contents
Fetching ...

Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

Arthur Charpentier, Agathe Fernandes-Machado

Abstract

Calibration is a conditional property that depends on the information retained by a predictor. We develop decomposition identities for arbitrary proper losses that make this dependence explicit. At any information level $\mathcal A$, the expected loss of an $\mathcal A$-measurable predictor splits into a proper-regret (reliability) term and a conditional entropy (residual uncertainty) term. For nested levels $\mathcal A\subseteq\mathcal B$, a chain decomposition quantifies the information gain from $\mathcal A$ to $\mathcal B$. Applied to classification with features $\boldsymbol{X}$ and score $S=s(\boldsymbol{X})$, this yields a three-term identity: miscalibration, a {\em grouping} term measuring information loss from $\boldsymbol{X}$ to $S$, and irreducible uncertainty at the feature level. We leverage the framework to analyze post-hoc recalibration, aggregation of calibrated models, and stagewise/boosting constructions, with explicit forms for Brier and log-loss.

Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

Abstract

Calibration is a conditional property that depends on the information retained by a predictor. We develop decomposition identities for arbitrary proper losses that make this dependence explicit. At any information level , the expected loss of an -measurable predictor splits into a proper-regret (reliability) term and a conditional entropy (residual uncertainty) term. For nested levels , a chain decomposition quantifies the information gain from to . Applied to classification with features and score , this yields a three-term identity: miscalibration, a {\em grouping} term measuring information loss from to , and irreducible uncertainty at the feature level. We leverage the framework to analyze post-hoc recalibration, aggregation of calibrated models, and stagewise/boosting constructions, with explicit forms for Brier and log-loss.
Paper Structure (75 sections, 16 theorems, 68 equations, 6 figures, 3 tables)

This paper contains 75 sections, 16 theorems, 68 equations, 6 figures, 3 tables.

Key Result

Theorem 2.1

Let $\mathcal{A}\subseteq\mathcal{F}$ and let $T$ be a $\mathcal{A}$-measurable random variable taking values in $\Delta(\mathcal{Y})$. Then

Figures (6)

  • Figure 1: Left: learned monotone calibration maps $\widehat{g}(\cdot)$ for the two base models ($\widehat{s}_1$ in green, $\widehat{s}_2$ in blue) and for their average $\widehat{s}_{\mathrm{ens}}$ (red), shown here for $\rho=0$. The grey diagonal corresponds to perfect calibration. Right: local calibration score as a function of $\rho$ for the two base models (green/blue) and their average (red), computed on an independent test split.
  • Figure 2: Reducible Brier components on test data. Bars show reliability $\widehat{\mathbb{E}}[(S-\widehat{C})^2]$ and grouping $\widehat{\mathbb{E}}[(\widehat{C}-Q)^2]$ before and after monotone recalibration, for three scores: a logistic model on $X_1$ (left), a logistic model on $(X_1,X_2)$ (middle), and a quantized score (right). Recalibration eliminates reliability but leaves grouping essentially unchanged.
  • Figure 3: GermanCredit: monotone-smoothed calibration curves on the test split. Green: raw score; red: after monotone post-hoc recalibration fit on a separate calibration split. The diagonal indicates perfect calibration.
  • Figure 4: GermanCredit: summary metrics on the test split. Left: miscalibration proxy $\mathrm{LCS}=\widehat{\mathbb{E}}(S-\widehat{C}(S))^2$. Middle/right: proper losses (Brier and log-loss) before vs. after recalibration.
  • Figure 5: True conditional probability surface $Q(x_1,x_2)$ for the synthetic model, on the left hand side. Then 3 scatterplots of simulated data, with $\rho=-0.7$ (middle left), $\rho=0$ (middle right), and $\rho=0.7$ (right).
  • ...and 1 more figures

Theorems & Definitions (34)

  • Theorem 2.1
  • Theorem 2.2: Chain decomposition for nested information levels
  • Lemma 2.3: When does the grouping term vanish?
  • Example 2.4: Perfect calibration does not prevent large information loss
  • Proposition 2.5: Uncertainty--resolution--reliability form
  • Theorem 2.6
  • Corollary 2.7: Variance-type decomposition (Brier)
  • Corollary 2.8: Log-loss: entropies and information
  • Definition 3.1: Perfect calibration
  • Proposition 3.2: Calibration error as proper regret
  • ...and 24 more