Improving Multi-Class Calibration through Normalization-Aware Isotonic Techniques
Alon Arad, Saharon Rosset
TL;DR
<3-5 sentence high-level summary> The paper addresses multiclass calibration by extending isotonic regression to normalization-aware settings. It introduces two non-parametric approaches, NA-FIR and SCIR, that integrate probability normalization either directly in the optimization (NA-FIR) or through a cumulative, rank-aware formulation (SCIR). Empirical results across tasks show consistent improvements in NLL and conf-ECE, establishing normalization-aware isotonic methods as strong non-parametric alternatives to parametric calibrators in diverse domains. The work highlights practical trade-offs between calibration quality and computational scalability, offering a flexible toolkit for reliable probabilistic predictions in multiclass problems.
Abstract
Accurate and reliable probability predictions are essential for multi-class supervised learning tasks, where well-calibrated models enable rational decision-making. While isotonic regression has proven effective for binary calibration, its extension to multi-class problems via one-vs-rest calibration produced suboptimal results when compared to parametric methods, limiting its practical adoption. In this work, we propose novel isotonic normalization-aware techniques for multiclass calibration, grounded in natural and intuitive assumptions expected by practitioners. Unlike prior approaches, our methods inherently account for probability normalization by either incorporating normalization directly into the optimization process (NA-FIR) or modeling the problem as a cumulative bivariate isotonic regression (SCIR). Empirical evaluation on a variety of text and image classification datasets across different model architectures reveals that our approach consistently improves negative log-likelihood (NLL) and expected calibration error (ECE) metrics.
