Calibration through the Lens of Interpretability

Alireza Torabian; Ruth Urner

Calibration through the Lens of Interpretability

Alireza Torabian, Ruth Urner

TL;DR

This work develops an axiomatic framework for calibration that separates calibration from accuracy and interpretability. It formalizes five desiderata—calibration, accuracy, approximating the regression function, interpretability via small, identifiable cells, and monotonicity with respect to the data-generating regression function—and analyzes their mutual relationships. It introduces relaxed, population-level metrics (CE_p,D, RMSE, PC, KT) and analyzes two interpretability-preserving operations (cell merging and average label assignment), deriving theoretical effects on calibration and related measures. Through an extensive empirical study on 36 real datasets, it compares interpretable decision trees to standard calibration methods (Platt scaling, isotonic regression, and PCT), showing that DT can offer competitive calibration while providing interpretable outputs, with PDE emerging as a favorable calibration metric. The paper argues for incorporating interpretability as a core criterion in calibration to ensure meaningful confidence scores for end users.

Abstract

Calibration is a frequently invoked concept when useful label probability estimates are required on top of classification accuracy. A calibrated model is a function whose values correctly reflect underlying label probabilities. Calibration in itself however does not imply classification accuracy, nor human interpretable estimates, nor is it straightforward to verify calibration from finite data. There is a plethora of evaluation metrics (and loss functions) that each assess a specific aspect of a calibration model. In this work, we initiate an axiomatic study of the notion of calibration. We catalogue desirable properties of calibrated models as well as corresponding evaluation metrics and analyze their feasibility and correspondences. We complement this analysis with an empirical evaluation, comparing common calibration methods to employing a simple, interpretable decision tree.

Calibration through the Lens of Interpretability

TL;DR

Abstract

Calibration through the Lens of Interpretability

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (21)