Assessing model error in counterfactual worlds
Emily Howerton, Justin Lessler
TL;DR
The paper addresses the lack of retrospective evaluation for counterfactual scenario projections by formalizing a framework that separates errors due to scenario deviation from model miscalibration. It outlines three evaluative approaches—restricting to plausible scenarios, inferring error distributions, and estimating observations in modeled scenarios—and tests them in a simulated epidemic setting to assess calibration accuracy. Results indicate that approaches leveraging modeled counterfactuals and causal inference (Approaches 2 and 3) more reliably recover calibration error than evaluating only plausible scenarios (Approach 1), with Approach 3 generally preferable when feasible. The work underscores the importance of careful scenario design and provides practical guidance for isolating calibration error to improve decision-making under uncertainty.
Abstract
Counterfactual scenario modeling exercises that ask "what would happen if?" are one of the most common ways we plan for the future. Despite their ubiquity in planning and decision making, scenario projections are rarely evaluated retrospectively. Differences between projections and observations come from two sources: scenario deviation and model miscalibration. We argue the latter is most important for assessing the value of models in decision making, but requires estimating model error in counterfactual worlds. Here we present and contrast three approaches for estimating this error, and demonstrate the benefits and limitations of each in a simulation experiment. We provide recommendations for the estimation of counterfactual error and discuss the components of scenario design that are required to make scenario projections evaluable.
