Robust Decision Making with Partially Calibrated Forecasts
Shayan Kiyani, Hamed Hassani, George Pappas, Aaron Roth
TL;DR
The paper tackles trustworthy decision making when forecasts are only partially calibrated, especially in high-dimensional settings. It introduces an $\mathcal{H}$-calibration framework to model calibration guarantees, defines an ambiguity set $\mathcal{Q}$ of conditional expectations consistent with these guarantees, and derives a minimax robust decision rule via duality. A key result is a sharp transition: if the calibration class contains decision-calibration tests, the minimax rule collapses to the plug-in best response, enabling simple deployment and simultaneous support for multiple decision problems; outside this regime, the paper provides efficient methods to compute robust policies, including self-orthogonality and bin-wise calibration instantiations. Empirical studies on Bike Sharing and California Housing demonstrate that the robust policy can outperform the plug-in policy under distribution shifts while incurring modest costs under ideal conditions, highlighting the practical value of calibration-aware robust decision making.
Abstract
Calibration has emerged as a foundational goal in ``trustworthy machine learning'', in part because of its strong decision theoretic semantics. Independent of the underlying distribution, and independent of the decision maker's utility function, calibration promises that amongst all policies mapping predictions to actions, the uniformly best policy is the one that ``trusts the predictions'' and acts as if they were correct. But this is true only of \emph{fully calibrated} forecasts, which are tractable to guarantee only for very low dimensional prediction problems. For higher dimensional prediction problems (e.g. when outcomes are multiclass), weaker forms of calibration have been studied that lack these decision theoretic properties. In this paper we study how a conservative decision maker should map predictions endowed with these weaker (``partial'') calibration guarantees to actions, in a way that is robust in a minimax sense: i.e. to maximize their expected utility in the worst case over distributions consistent with the calibration guarantees. We characterize their minimax optimal decision rule via a duality argument, and show that surprisingly, ``trusting the predictions and acting accordingly'' is recovered in this minimax sense by \emph{decision calibration} (and any strictly stronger notion of calibration), a substantially weaker and more tractable condition than full calibration. For calibration guarantees that fall short of decision calibration, the minimax optimal decision rule is still efficiently computable, and we provide an empirical evaluation of a natural one that applies to any regression model solved to optimize squared error.
