Table of Contents
Fetching ...

Robust Decision Making with Partially Calibrated Forecasts

Shayan Kiyani, Hamed Hassani, George Pappas, Aaron Roth

TL;DR

The paper tackles trustworthy decision making when forecasts are only partially calibrated, especially in high-dimensional settings. It introduces an $\mathcal{H}$-calibration framework to model calibration guarantees, defines an ambiguity set $\mathcal{Q}$ of conditional expectations consistent with these guarantees, and derives a minimax robust decision rule via duality. A key result is a sharp transition: if the calibration class contains decision-calibration tests, the minimax rule collapses to the plug-in best response, enabling simple deployment and simultaneous support for multiple decision problems; outside this regime, the paper provides efficient methods to compute robust policies, including self-orthogonality and bin-wise calibration instantiations. Empirical studies on Bike Sharing and California Housing demonstrate that the robust policy can outperform the plug-in policy under distribution shifts while incurring modest costs under ideal conditions, highlighting the practical value of calibration-aware robust decision making.

Abstract

Calibration has emerged as a foundational goal in ``trustworthy machine learning'', in part because of its strong decision theoretic semantics. Independent of the underlying distribution, and independent of the decision maker's utility function, calibration promises that amongst all policies mapping predictions to actions, the uniformly best policy is the one that ``trusts the predictions'' and acts as if they were correct. But this is true only of \emph{fully calibrated} forecasts, which are tractable to guarantee only for very low dimensional prediction problems. For higher dimensional prediction problems (e.g. when outcomes are multiclass), weaker forms of calibration have been studied that lack these decision theoretic properties. In this paper we study how a conservative decision maker should map predictions endowed with these weaker (``partial'') calibration guarantees to actions, in a way that is robust in a minimax sense: i.e. to maximize their expected utility in the worst case over distributions consistent with the calibration guarantees. We characterize their minimax optimal decision rule via a duality argument, and show that surprisingly, ``trusting the predictions and acting accordingly'' is recovered in this minimax sense by \emph{decision calibration} (and any strictly stronger notion of calibration), a substantially weaker and more tractable condition than full calibration. For calibration guarantees that fall short of decision calibration, the minimax optimal decision rule is still efficiently computable, and we provide an empirical evaluation of a natural one that applies to any regression model solved to optimize squared error.

Robust Decision Making with Partially Calibrated Forecasts

TL;DR

The paper tackles trustworthy decision making when forecasts are only partially calibrated, especially in high-dimensional settings. It introduces an -calibration framework to model calibration guarantees, defines an ambiguity set of conditional expectations consistent with these guarantees, and derives a minimax robust decision rule via duality. A key result is a sharp transition: if the calibration class contains decision-calibration tests, the minimax rule collapses to the plug-in best response, enabling simple deployment and simultaneous support for multiple decision problems; outside this regime, the paper provides efficient methods to compute robust policies, including self-orthogonality and bin-wise calibration instantiations. Empirical studies on Bike Sharing and California Housing demonstrate that the robust policy can outperform the plug-in policy under distribution shifts while incurring modest costs under ideal conditions, highlighting the practical value of calibration-aware robust decision making.

Abstract

Calibration has emerged as a foundational goal in ``trustworthy machine learning'', in part because of its strong decision theoretic semantics. Independent of the underlying distribution, and independent of the decision maker's utility function, calibration promises that amongst all policies mapping predictions to actions, the uniformly best policy is the one that ``trusts the predictions'' and acts as if they were correct. But this is true only of \emph{fully calibrated} forecasts, which are tractable to guarantee only for very low dimensional prediction problems. For higher dimensional prediction problems (e.g. when outcomes are multiclass), weaker forms of calibration have been studied that lack these decision theoretic properties. In this paper we study how a conservative decision maker should map predictions endowed with these weaker (``partial'') calibration guarantees to actions, in a way that is robust in a minimax sense: i.e. to maximize their expected utility in the worst case over distributions consistent with the calibration guarantees. We characterize their minimax optimal decision rule via a duality argument, and show that surprisingly, ``trusting the predictions and acting accordingly'' is recovered in this minimax sense by \emph{decision calibration} (and any strictly stronger notion of calibration), a substantially weaker and more tractable condition than full calibration. For calibration guarantees that fall short of decision calibration, the minimax optimal decision rule is still efficiently computable, and we provide an empirical evaluation of a natural one that applies to any regression model solved to optimize squared error.

Paper Structure

This paper contains 25 sections, 9 theorems, 105 equations, 2 figures, 1 table.

Key Result

Theorem 3.1

Suppose $\mathcal{H}=\mathrm{span}\{h_1,\ldots,h_k\}$ with each $h_i:[0,1]^d\to\mathbb{R}$, and let $\mathcal{Q}$ be defined as above. Then the minimax problem in Equation robust admits a saddle point $(a_{\mathrm{robust}},q^\star)$ with the following structure: There exist multipliers $\lambda^\sta Given $q^\star$, the optimal robust action at $v$ is the best response to $q^\star(v)$:

Figures (2)

  • Figure 1: Schematic of the interpolating property
  • Figure 2: Schematic of the Sharp Transition

Theorems & Definitions (17)

  • Remark 2.1
  • Theorem 3.1: Characterization of the Optimal Robust Policy
  • Theorem 4.1: Decision calibration $\Rightarrow$ plug-in best response optimality
  • Theorem 4.2: Decision calibration is sufficient, and remains sufficient under richer tests
  • Corollary 4.3: Simultaneous plug-in optimality across multiple decisions
  • Proposition 4.4: Self-orthogonality under squared loss
  • Proposition 4.5: Robust policy under bin-wise calibration
  • proof
  • proof
  • proof
  • ...and 7 more