Table of Contents
Fetching ...

Three Types of Calibration with Properties and their Semantic and Formal Relationships

Rabanus Derr, Jessie Finocchiaro, Robert C. Williamson

TL;DR

The paper addresses the fragmentation of calibration notions by proposing a semantic map that organizes them into three core types: distribution calibration with respect to a property, Gamma-calibration (self-realization), and decision calibration (precise loss estimation). It demonstrates that distribution calibration is the central notion that implies the others under suitable conditions, and shows how self-realization and precise loss estimation relate through inheritance to refined properties and omniprediction concepts. By formalizing calibration via properties and linking to swap regret, Bayes risk, and reflection principles, it provides a unifying framework that clarifies when different notions coincide or diverge. The work also discusses extensions to groups (multicalibration) and outlines practical implications for trustworthiness, fairness, and the design of calibration-driven decision systems.

Abstract

Fueled by discussions around "trustworthiness" and algorithmic fairness, calibration of predictive systems has regained scholars attention. The vanilla definition and understanding of calibration is, simply put, on all days on which the rain probability has been predicted to be p, the actual frequency of rain days was p. However, the increased attention has led to an immense variety of new notions of "calibration." Some of the notions are incomparable, serve different purposes, or imply each other. In this work, we provide two accounts which motivate calibration: self-realization of forecasted properties and precise estimation of incurred losses of the decision makers relying on forecasts. We substantiate the former via the reflection principle and the latter by actuarial fairness. For both accounts we formulate prototypical definitions via properties $Γ$ of outcome distributions, e.g., the mean or median. The prototypical definition for self-realization, which we call $Γ$-calibration, is equivalent to a certain type of swap regret under certain conditions. These implications are strongly connected to the omniprediction learning paradigm. The prototypical definition for precise loss estimation is a modification of decision calibration adopted from Zhao et al. [73]. For binary outcome sets both prototypical definitions coincide under appropriate choices of reference properties. For higher-dimensional outcome sets, both prototypical definitions can be subsumed by a natural extension of the binary definition, called distribution calibration with respect to a property. We conclude by commenting on the role of groupings in both accounts of calibration often used to obtain multicalibration. In sum, this work provides a semantic map of calibration in order to navigate a fragmented terrain of notions and definitions.

Three Types of Calibration with Properties and their Semantic and Formal Relationships

TL;DR

The paper addresses the fragmentation of calibration notions by proposing a semantic map that organizes them into three core types: distribution calibration with respect to a property, Gamma-calibration (self-realization), and decision calibration (precise loss estimation). It demonstrates that distribution calibration is the central notion that implies the others under suitable conditions, and shows how self-realization and precise loss estimation relate through inheritance to refined properties and omniprediction concepts. By formalizing calibration via properties and linking to swap regret, Bayes risk, and reflection principles, it provides a unifying framework that clarifies when different notions coincide or diverge. The work also discusses extensions to groups (multicalibration) and outlines practical implications for trustworthiness, fairness, and the design of calibration-driven decision systems.

Abstract

Fueled by discussions around "trustworthiness" and algorithmic fairness, calibration of predictive systems has regained scholars attention. The vanilla definition and understanding of calibration is, simply put, on all days on which the rain probability has been predicted to be p, the actual frequency of rain days was p. However, the increased attention has led to an immense variety of new notions of "calibration." Some of the notions are incomparable, serve different purposes, or imply each other. In this work, we provide two accounts which motivate calibration: self-realization of forecasted properties and precise estimation of incurred losses of the decision makers relying on forecasts. We substantiate the former via the reflection principle and the latter by actuarial fairness. For both accounts we formulate prototypical definitions via properties of outcome distributions, e.g., the mean or median. The prototypical definition for self-realization, which we call -calibration, is equivalent to a certain type of swap regret under certain conditions. These implications are strongly connected to the omniprediction learning paradigm. The prototypical definition for precise loss estimation is a modification of decision calibration adopted from Zhao et al. [73]. For binary outcome sets both prototypical definitions coincide under appropriate choices of reference properties. For higher-dimensional outcome sets, both prototypical definitions can be subsumed by a natural extension of the binary definition, called distribution calibration with respect to a property. We conclude by commenting on the role of groupings in both accounts of calibration often used to obtain multicalibration. In sum, this work provides a semantic map of calibration in order to navigate a fragmented terrain of notions and definitions.

Paper Structure

This paper contains 27 sections, 24 theorems, 91 equations, 5 figures, 2 tables.

Key Result

Proposition 4

Let $\Gamma \colon \mathcal{P} \to \mathcal{R}$ be a property and $D$ a regular data distribution on $\mathcal{X} \times \mathcal{Y}$, where $\mathcal{Y}$ is finite. Let $f \colon \mathcal{X} \to \mathcal{P}$ be a distributional predictor with $| \mathrm{im} f | < \infty$. If the predictor $f$ is $\

Figures (5)

  • Figure 1: Relationships between Notions of Calibration. Implications under perfect calibration, finite $\mathcal{Y}$ and elicitable property $\Gamma$ and $\Phi$. The three types of calibration are marked in different colors. The abstract accounts of calibration are shaded.
  • Figure 2: Relationship between Approximate Notions of Calibration. Implications Under Approximate Calibration and Finite $\mathcal{Y}$. Further conditions are stated in the the referenced propositions. Those conditions particularly contain Lipschitz and Smoothness assumptions. The three types of calibration are marked in different colors. The abstract accounts of calibration are shaded.
  • Figure 3: Illustration of distribution calibration. The outcome set is defined as $\mathcal{Y} = \{ 0,1,2\}$, the input set $\mathcal{X} = \{ 0,1,2\}$. We define $\Gamma(P) = \arg\,\max_{y \in \mathcal{Y}} P(Y = y)$. The level sets of $\Gamma$ are drawn in different shades of gray. The left-directing markers denote the true conditional distribution for different choices of $x \in \mathcal{X}$. The right-directing markers denote the predicted distribution by a predictor $f$. The purple markers are convex combinations of the blue and red markers, where the convex combination is defined through the marginal distribution on $\mathcal{X}$ which is in our case fixed to be uniform. The dashed lines highlight the deviation from the true outcome distribution conditioned on a value of $\Gamma \circ f$ versus the expected forecast conditioned on a value of $\Gamma \circ f$. Only the forecasts change when comparing Figure \ref{['fig:distribution calibration perfect']} versus Figure \ref{['fig:distribution calibration notperfect']}.
  • Figure 4: Illustration of property refinement and inheritance of distribution calibration. The outcome set is defined as $\mathcal{Y} = \{ 0,1,2\}$, the input set $|\mathcal{X}| = 6$. We define $\Gamma(P) = (y_1, y_2, y_3)$ such that $P(Y = y_1) \ge P(Y = y_2) \ge P(Y = y_3)$ and $\Phi(P) = \arg\,\max_{y \in \mathcal{Y}} P(Y = y)$. The level sets of $\Gamma$ have colored boundaries listed in the legend. The level sets of $\Phi$ are colored respectively drawn in different shades of gray. The property $\Gamma$ refines $\Phi$. We assume that $f$ is a distributional predictor whose predictions are marked as dots. The color of the dots represents $\Gamma\circ f(x)$. The lines indicate the convex combination of predictions happening when conditioning on $\Phi \circ f(X)$ instead of $\Gamma \circ f(X)$. Since the level sets of $\Phi$ are all convex the line is always contained within a single level set.
  • Figure 5: Illustration of $\Gamma$-calibration. The outcome set is defined as $\mathcal{Y} = \{ 0,1,2\}$, the input set $\mathcal{X} = \{ 0,1,2\}$. We define $\Gamma(P) = \arg\,\max_{y \in \mathcal{Y}} P(Y = y)$. The level sets of $\Gamma$ are colored respectively drawn in different shades of gray. The left-directing markers denote the true conditional distribution for different choices of $x \in \mathcal{X}$. We assume that $f$ is a distributional predictor (right-directing markers) which is then fed into $\Gamma$. The dots denote the true outcome distribution conditioned on a value of $\Gamma \circ f$. If the dot is in the level set of the same color, then the prediction is perfectly $\Gamma$-calibrated. Note that Figure \ref{['fig:property calibration perfect']} uses the same predictions and outcome distributions as Figure \ref{['fig:distribution calibration notperfect']} showing that predictions could be perfect $\Gamma$-calibrated while only being approximately distribution calibrated with respect to $\Gamma$. Figure \ref{['fig:property calibration perfect 2']} shows that $\Gamma$-calibration does no necessarily require the true conditional distributions to all live in the correct level set. Only the predictor $f$ is changed from Figure \ref{['fig:property calibration perfect']} to Figure \ref{['fig:property calibration perfect 2']}. Figure \ref{['fig:property calibration notperfect']} then changes the true conditional distribution compared to Figure \ref{['fig:property calibration perfect']} but fixes $f$, which lead to the violation of the perfect $\Gamma$-calibration constraint.

Theorems & Definitions (65)

  • Definition 1: Binary Vanilla Calibration
  • Definition 2: Distribution Calibration with Respect to $\Gamma$
  • Definition 3: Property Refinement frongillo_elicitation_2021
  • Proposition 4: Distribution Calibration is Inherited
  • proof
  • Definition 5: $\Gamma$-Calibration noarov_statistical_2023
  • Proposition 6: Distribution Calibration with Respect to $\Gamma$ implies $\Gamma$-Calibration
  • proof
  • Proposition 7: Approximate Distribution Calibration w.r.t. $\Gamma$ implies approximate $\Gamma$-Calibration for smooth $\Gamma$
  • proof
  • ...and 55 more