Recalibrating binary probabilistic classifiers
Dirk Tasche
TL;DR
The paper addresses calibrating binary probabilistic classifiers to a target prior $q$ when test data come from a different joint distribution $Q(X,Y)$ than the training distribution $P(X,Y)$. It analyzes recalibration under several distribution-shift models and introduces two new methods: parametric covariate shift with posterior drift (CSPD) and ROC-based quasi moment matching (QMM), comparing them with existing approaches through a credit-risk-inspired illustration. A key contribution is linking the recalibration problem to AUC performance under distribution shift, using AUC-consistency as a criterion to identify meaningful transformations. The results show that ROC-based QMM methods can provide conservatively biased yet practical recalibrations that respect a predefined prior $q$ while maintaining favorable AUC properties, offering guidance for prudent recalibration in credit risk and similar settings. Together, the work informs when and how to recalibrate PD-like scores under distribution shift, with implications for regulatory or stress-testing contexts.
Abstract
Recalibration of binary probabilistic classifiers to a target prior probability is an important task in areas like credit risk management. However, recalibration of a classifier learned on a training dataset to a target on a test dataset in general is not a well-defined problem because there might be more than one way to transform the original posterior probabilities such that the target is matched. In this paper, methods for recalibration are analysed from a distribution shift perspective. Distribution shift assumptions linked to the area under the curve (AUC) of a probabilistic classifier are found to be useful for the design of meaningful recalibration methods. Two new methods called parametric covariate shift with posterior drift (CSPD) and ROC-based quasi moment matching (QMM) are proposed and tested together with some other methods in an example setting. The outcomes of the test suggest that the QMM methods discussed in the paper can provide appropriately conservative results in evaluations with concave functions like for instance risk weights functions for credit risk.
