Multiclass Local Calibration With the Jensen-Shannon Distance
Cesare Barbera, Lorenzo Perini, Giovanni De Toni, Andrea Passerini, Andrea Pugnana
TL;DR
This work introduces multiclass local calibration to address proximity bias in probability estimates, formalizing a locality‑aware calibration notion and linking it to strong calibration. It then proposes Local Calibration Networks (LCNs) that learn new feature representations and logits to align predictions with locally observed class frequencies via the Jensen‑Shannon distance, while keeping inference efficient. The authors provide theoretical results, including bounds for continuous MECE and binning‑based metrics under local calibration, and analyze the bias–variance trade‑offs inherent in kernel‑based locality estimates. Empirically, LCNs achieve superior local calibration across datasets and maintain competitive global calibration, also delivering predictive performance gains (lower NLL and higher accuracy) on harder multiclass tasks, demonstrating practical impact for trustworthy, regionally calibrated predictions in high‑stakes settings.
Abstract
Developing trustworthy Machine Learning (ML) models requires their predicted probabilities to be well-calibrated, meaning they should reflect true-class frequencies. Among calibration notions in multiclass classification, strong calibration is the most stringent, as it requires all predicted probabilities to be simultaneously calibrated across all classes. However, existing approaches to multiclass calibration lack a notion of distance among inputs, which makes them vulnerable to proximity bias: predictions in sparse regions of the feature space are systematically miscalibrated. This is especially relevant in high-stakes settings, such as healthcare, where the sparse instances are exactly those most at risk of biased treatment. In this work, we address this main shortcoming by introducing a local perspective on multiclass calibration. First, we formally define multiclass local calibration and establish its relationship with strong calibration. Second, we theoretically analyze the pitfalls of existing evaluation metrics when applied to multiclass local calibration. Third, we propose a practical method for enhancing local calibration in Neural Networks, which enforces alignment between predicted probabilities and local estimates of class frequencies using the Jensen-Shannon distance. Finally, we empirically validate our approach against existing multiclass calibration techniques.
