Table of Contents
Fetching ...

Adaptive Set-Mass Calibration with Conformal Prediction

Daniil Kazantsev, Mohsen Guizani, Eric Moulines, Maxim Panov, Nikita Kotelevskii

TL;DR

This work introduces cumulative mass calibration (CMC) and the CMCE metric to evaluate set-valued calibration, addressing the gap that traditional confidence- or class-wise calibration do not guarantee predictive-set validity. It builds on split conformal prediction to obtain marginally valid predictive sets and then applies two simple post-hoc procedures, mass rescaling and temperature scaling, to enforce a cumulative mass constraint at a chosen level $1-\alpha$, yielding $\alpha$-cumulative-mass-calibrated classifiers with marginal guarantees. Empirically, the proposed methods consistently improve CMCE and often other metrics (ECE, cw-ECE, NLL, Brier) on large-class benchmarks (e.g., CIFAR-100, ImageNet, iNaturalist21), and produce near-ideal cumulative-mass calibration curves, especially as the number of classes grows. The results demonstrate practical, scalable calibration with theoretical marginal guarantees, while highlighting limitations related to choosing $\alpha$, conditional calibration, and potential conservativeness in heterogeneous data settings.

Abstract

Reliable probabilities are critical in high-risk applications, yet common calibration criteria (confidence, class-wise) are only necessary for full distributional calibration, and post-hoc methods often lack distribution-free guarantees. We propose a set-based notion of calibration, cumulative mass calibration, and a corresponding empirical error measure: the Cumulative Mass Calibration Error (CMCE). We develop a new calibration procedure that starts with conformal prediction to obtain a set of labels that gives the desired coverage. We then instantiate two simple post-hoc calibrators: a mass normalization and a temperature scaling-based rule, tuned to the conformal constraint. On multi-class image benchmarks, especially with a large number of classes, our methods consistently improve CMCE and standard metrics (ECE, cw-ECE, MCE) over baselines, delivering a practical, scalable framework with theoretical guarantees.

Adaptive Set-Mass Calibration with Conformal Prediction

TL;DR

This work introduces cumulative mass calibration (CMC) and the CMCE metric to evaluate set-valued calibration, addressing the gap that traditional confidence- or class-wise calibration do not guarantee predictive-set validity. It builds on split conformal prediction to obtain marginally valid predictive sets and then applies two simple post-hoc procedures, mass rescaling and temperature scaling, to enforce a cumulative mass constraint at a chosen level , yielding -cumulative-mass-calibrated classifiers with marginal guarantees. Empirically, the proposed methods consistently improve CMCE and often other metrics (ECE, cw-ECE, NLL, Brier) on large-class benchmarks (e.g., CIFAR-100, ImageNet, iNaturalist21), and produce near-ideal cumulative-mass calibration curves, especially as the number of classes grows. The results demonstrate practical, scalable calibration with theoretical marginal guarantees, while highlighting limitations related to choosing , conditional calibration, and potential conservativeness in heterogeneous data settings.

Abstract

Reliable probabilities are critical in high-risk applications, yet common calibration criteria (confidence, class-wise) are only necessary for full distributional calibration, and post-hoc methods often lack distribution-free guarantees. We propose a set-based notion of calibration, cumulative mass calibration, and a corresponding empirical error measure: the Cumulative Mass Calibration Error (CMCE). We develop a new calibration procedure that starts with conformal prediction to obtain a set of labels that gives the desired coverage. We then instantiate two simple post-hoc calibrators: a mass normalization and a temperature scaling-based rule, tuned to the conformal constraint. On multi-class image benchmarks, especially with a large number of classes, our methods consistently improve CMCE and standard metrics (ECE, cw-ECE, MCE) over baselines, delivering a practical, scalable framework with theoretical guarantees.

Paper Structure

This paper contains 52 sections, 27 equations, 3 figures, 12 tables.

Figures (3)

  • Figure 1: Left: Evaluation on the synthetic data as the number of classes $K$ increases. NLL and Brier are computed with ground-truth $p(y\mid x)$. Our method achieves consistently lower NLL, Brier, and CMCE, with gains that widen as $K$ grows. Right: Example sample from the synthetic dataset with $K=49$; points are colored by class.
  • Figure 2: Cumulative mass calibration curves for different calibrators on two datasets. Left: ImageNet. Right: iNaturalist21.
  • Figure 3: Coverage sensitivity on CIFAR-100 (ResNet-56) around target levels $1-\alpha$. Left: $\alpha=0.1$. Right: $\alpha=0.01$. The dashed horizontal line marks the target coverage $1-\alpha$; curves that stay closer to this line across offsets $\epsilon$ indicate better mass calibration. Our methods (Naive CMCE and TS (ours)) adhere most closely to the target in both panels, while several classical baselines systematically under- or over-cover.

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5