Table of Contents
Fetching ...

Truthfulness of Decision-Theoretic Calibration Measures

Mingda Qiao, Eric Zhao

TL;DR

This work analyzes sequential calibration measures for probabilistic forecasting, identifying a fundamental conflict between truthfulness and decision-theoretic (no-regret) guarantees. It introduces subsampled step calibration, StepCE^{sub}, a measure that is both truthful and decision-theoretic on product distributions, with truthfulness gaps that shrink under smoothed analysis to O(\sqrt{\log(1/c)}) and to O(1) for non-smoothed product settings. The authors also prove an impossibility result showing that, in the absence of smoothing, any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in general. They provide a complete technical development: StepCE is complete, sound, and decision-theoretic, and an efficient algorithm attains O(\sqrt{T}) regret in the adversarial setting; under smoothed analysis, StepCE^{sub} is truthful with quantified gaps, and U-Calibrations remain non-truthful via explicit constructions. Overall, the paper advances a principled, geometry-aware view of calibration that balances truthfulness and downstream utility, offering both theoretical impossibility results and practical, provably truthful calibration mechanisms with smoothed-analysis advantages.

Abstract

Calibration measures quantify how much a forecaster's predictions violates calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best-respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, $\mathsf{StepCE}^{\textsf{sub}}$, that is both decision-theoretic and truthful. In particular, on any product distribution, $\mathsf{StepCE}^{\textsf{sub}}$ is truthful up to an $O(1)$ factor whereas prior decision-theoretic calibration measures suffer from an $e^{-Ω(T)}$-$Ω(\sqrt{T})$ truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude $c > 0$, $\mathsf{StepCE}^{\textsf{sub}}$ is truthful up to an $O(\sqrt{\log(1/c)})$ factor, while prior decision-theoretic measures have an $e^{-Ω(T)}$-$Ω(T^{1/3})$ truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting.

Truthfulness of Decision-Theoretic Calibration Measures

TL;DR

This work analyzes sequential calibration measures for probabilistic forecasting, identifying a fundamental conflict between truthfulness and decision-theoretic (no-regret) guarantees. It introduces subsampled step calibration, StepCE^{sub}, a measure that is both truthful and decision-theoretic on product distributions, with truthfulness gaps that shrink under smoothed analysis to O(\sqrt{\log(1/c)}) and to O(1) for non-smoothed product settings. The authors also prove an impossibility result showing that, in the absence of smoothing, any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in general. They provide a complete technical development: StepCE is complete, sound, and decision-theoretic, and an efficient algorithm attains O(\sqrt{T}) regret in the adversarial setting; under smoothed analysis, StepCE^{sub} is truthful with quantified gaps, and U-Calibrations remain non-truthful via explicit constructions. Overall, the paper advances a principled, geometry-aware view of calibration that balances truthfulness and downstream utility, offering both theoretical impossibility results and practical, provably truthful calibration mechanisms with smoothed-analysis advantages.

Abstract

Calibration measures quantify how much a forecaster's predictions violates calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best-respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, , that is both decision-theoretic and truthful. In particular, on any product distribution, is truthful up to an factor whereas prior decision-theoretic calibration measures suffer from an - truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude , is truthful up to an factor, while prior decision-theoretic measures have an - truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting.

Paper Structure

This paper contains 67 sections, 26 theorems, 212 equations, 1 algorithm.

Key Result

Lemma 2.2

For any $x \in \{0, 1\}^T$ and $p \in [0, 1]^T$, it holds that

Theorems & Definitions (48)

  • Definition 2.1: Completeness and soundness HQYZ24
  • Lemma 2.2: Theorem 8 of KLST23
  • Proposition 2.2
  • Proposition 2.3: Proposition A.3 of HQYZ24
  • Proposition 4.1
  • proof
  • Proposition 4.2
  • proof
  • Proposition 4.3
  • proof
  • ...and 38 more