A causal viewpoint on prediction model performance under changes in case-mix: discrimination and calibration respond differently for prognosis and diagnosis predictions
Wouter A. C. van Amsterdam
TL;DR
The paper presents a causal framework linking case-mix shifts to predictive-performance changes, showing that discrimination and calibration respond differently depending on whether prognosis (causal direction) or diagnosis (anti-causal direction) is being predicted. By defining case-mix shifts as changes in the marginal distribution of the cause variable and analyzing $P(Y|X)$ versus $P(X|Y)$, it proves that calibration is stable under $P(X)$ shifts for causal predictions while discrimination is not, and the reverse for anti-causal predictions. The authors validate the theory with illustrative simulations and with an empirical review of 1,382 models across 2,030 external validations, finding higher variability in discrimination for prognostic models, in line with the framework. These insights inform model development, evaluation, and deployment across clinical settings, emphasizing alignment of features with causal direction and cautious recalibration when calibration matters across environments.
Abstract
Prediction models need reliable predictive performance as they inform clinical decisions, aiding in diagnosis, prognosis, and treatment planning. The predictive performance of these models is typically assessed through discrimination and calibration. Changes in the distribution of the data impact model performance and there may be important changes between a model's current application and when and where its performance was last evaluated. In health-care, a typical change is a shift in case-mix. For example, for cardiovascular risk management, a general practitioner sees a different mix of patients than a specialist in a tertiary hospital. This work introduces a novel framework that differentiates the effects of case-mix shifts on discrimination and calibration based on the causal direction of the prediction task. When prediction is in the causal direction (often the case for prognosis predictions), calibration remains stable under case-mix shifts, while discrimination does not. Conversely, when predicting in the anti-causal direction (often with diagnosis predictions), discrimination remains stable, but calibration does not. A simulation study and empirical validation using cardiovascular disease prediction models demonstrate the implications of this framework. The causal case-mix framework provides insights for developing, evaluating and deploying prediction models across different clinical settings, emphasizing the importance of understanding the causal structure of the prediction task.
