From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

Agathe Fernandes Machado; Arthur Charpentier; Emmanuel Flachaire; Ewen Gallic; François Hu

From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire, Ewen Gallic, François Hu

TL;DR

The paper tackles the misalignment between discriminative performance and probabilistic calibration in binary classification, arguing that well-calibrated scores are essential for decision-making in finance and healthcare. It introduces the Local Calibration Score (LCS), a calibration metric based on a smooth calibration curve derived from local regression, and compares it to traditional quantile-based metrics like ECE. Through synthetic data with known probabilities and a real-world credit-default dataset, the authors show LCS more accurately tracks true miscalibration and that recalibration methods (Platt, isotonic, Beta, and local regression) improve calibration, sometimes at a small cost to discriminative ability (AUC). They further demonstrate that optimizing solely for AUC can degrade calibration, and that using a regression-based Random Forest yields better calibration than a classifier, underscoring the practical value of calibration-aware model tuning.

Abstract

The assessment of binary classifier performance traditionally centers on discriminative ability using metrics, such as accuracy. However, these metrics often disregard the model's inherent uncertainty, especially when dealing with sensitive decision-making domains, such as finance or healthcare. Given that model-predicted scores are commonly seen as event probabilities, calibration is crucial for accurate interpretation. In our study, we analyze the sensitivity of various calibration measures to score distortions and introduce a refined metric, the Local Calibration Score. Comparing recalibration methods, we advocate for local regressions, emphasizing their dual role as effective recalibration tools and facilitators of smoother visualizations. We apply these findings in a real-world scenario using Random Forest classifier and regressor to predict credit default while simultaneously measuring calibration during performance optimization.

From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

TL;DR

Abstract

Paper Structure (34 sections, 1 theorem, 22 equations, 20 figures, 1 table)

This paper contains 34 sections, 1 theorem, 22 equations, 20 figures, 1 table.

Introduction
Calibration
Measuring Calibration
Quantile based measures
Calibration curve
Expected Calibration Error
Local Regression based measure
Smoothed calibration curve
Local Calibration Score
Impact of a Poor Calibration
Synthetic data
Recalibration
Recalibration Methods
Platt Scaling
Isotonic Regression
...and 19 more sections

Key Result

Proposition 2.1

Consider a dataset $\{(d_i,\mathbf{x{_i}})\}$, where $\mathbf{x}$ are $k$ features ($k$ being fixed), so that $D|\boldsymbol{X}=\mathbf{x} \sim \mathcal{B}(s(\mathbf{x}))$ where Let $\widehat{\beta}_0$ and $\widehat{\boldsymbol{\beta}}$ denote maximum likelihood estimators. Then, for any $\mathbf{x}$, the score is defined as is well-calibrated in the sense that

Figures (20)

Figure 1: Distorted Probabilities as a Function of True Probabilities, Depending on the Value of $\alpha$ (left) or $\gamma$ (right).
Figure 2: Calibration Metrics on 200 Simulations for each Value of $\alpha$ (top) or $\gamma$ (bottom).
Figure 3: Calibration Curve Obtained with Local Regression, on 200 simulations for each Value of $\alpha$ (top) or $\gamma$ (bottom). Distribution of the true probabilities are shown in the histograms (gold for $d=1$, purple for $d=0$).
Figure 4: Standard Goodness of Fit Metrics on 200 Simulations for each Value of $\alpha$ (top) or $\gamma$ (bottom). The probability threshold is set to $\tau=0.5$.
Figure 5: Metrics After Recalibration (for $\gamma=3$), on the Calibration (transparent colors) and on the Test Set (full colors).
...and 15 more figures

Theorems & Definitions (2)

Proposition 2.1
proof

From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

TL;DR

Abstract

From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (2)