Table of Contents
Fetching ...

Multiclass Local Calibration With the Jensen-Shannon Distance

Cesare Barbera, Lorenzo Perini, Giovanni De Toni, Andrea Passerini, Andrea Pugnana

TL;DR

This work introduces multiclass local calibration to address proximity bias in probability estimates, formalizing a locality‑aware calibration notion and linking it to strong calibration. It then proposes Local Calibration Networks (LCNs) that learn new feature representations and logits to align predictions with locally observed class frequencies via the Jensen‑Shannon distance, while keeping inference efficient. The authors provide theoretical results, including bounds for continuous MECE and binning‑based metrics under local calibration, and analyze the bias–variance trade‑offs inherent in kernel‑based locality estimates. Empirically, LCNs achieve superior local calibration across datasets and maintain competitive global calibration, also delivering predictive performance gains (lower NLL and higher accuracy) on harder multiclass tasks, demonstrating practical impact for trustworthy, regionally calibrated predictions in high‑stakes settings.

Abstract

Developing trustworthy Machine Learning (ML) models requires their predicted probabilities to be well-calibrated, meaning they should reflect true-class frequencies. Among calibration notions in multiclass classification, strong calibration is the most stringent, as it requires all predicted probabilities to be simultaneously calibrated across all classes. However, existing approaches to multiclass calibration lack a notion of distance among inputs, which makes them vulnerable to proximity bias: predictions in sparse regions of the feature space are systematically miscalibrated. This is especially relevant in high-stakes settings, such as healthcare, where the sparse instances are exactly those most at risk of biased treatment. In this work, we address this main shortcoming by introducing a local perspective on multiclass calibration. First, we formally define multiclass local calibration and establish its relationship with strong calibration. Second, we theoretically analyze the pitfalls of existing evaluation metrics when applied to multiclass local calibration. Third, we propose a practical method for enhancing local calibration in Neural Networks, which enforces alignment between predicted probabilities and local estimates of class frequencies using the Jensen-Shannon distance. Finally, we empirically validate our approach against existing multiclass calibration techniques.

Multiclass Local Calibration With the Jensen-Shannon Distance

TL;DR

This work introduces multiclass local calibration to address proximity bias in probability estimates, formalizing a locality‑aware calibration notion and linking it to strong calibration. It then proposes Local Calibration Networks (LCNs) that learn new feature representations and logits to align predictions with locally observed class frequencies via the Jensen‑Shannon distance, while keeping inference efficient. The authors provide theoretical results, including bounds for continuous MECE and binning‑based metrics under local calibration, and analyze the bias–variance trade‑offs inherent in kernel‑based locality estimates. Empirically, LCNs achieve superior local calibration across datasets and maintain competitive global calibration, also delivering predictive performance gains (lower NLL and higher accuracy) on harder multiclass tasks, demonstrating practical impact for trustworthy, regionally calibrated predictions in high‑stakes settings.

Abstract

Developing trustworthy Machine Learning (ML) models requires their predicted probabilities to be well-calibrated, meaning they should reflect true-class frequencies. Among calibration notions in multiclass classification, strong calibration is the most stringent, as it requires all predicted probabilities to be simultaneously calibrated across all classes. However, existing approaches to multiclass calibration lack a notion of distance among inputs, which makes them vulnerable to proximity bias: predictions in sparse regions of the feature space are systematically miscalibrated. This is especially relevant in high-stakes settings, such as healthcare, where the sparse instances are exactly those most at risk of biased treatment. In this work, we address this main shortcoming by introducing a local perspective on multiclass calibration. First, we formally define multiclass local calibration and establish its relationship with strong calibration. Second, we theoretically analyze the pitfalls of existing evaluation metrics when applied to multiclass local calibration. Third, we propose a practical method for enhancing local calibration in Neural Networks, which enforces alignment between predicted probabilities and local estimates of class frequencies using the Jensen-Shannon distance. Finally, we empirically validate our approach against existing multiclass calibration techniques.

Paper Structure

This paper contains 57 sections, 7 theorems, 120 equations, 4 figures, 2 tables.

Key Result

Theorem 1

Let $D$ be an evaluation dataset drawn i.i.d. from a distribution $\mathcal{P}$. Define the continuous Multidimensional Expected Calibration Error (MECE) as: If a model $f$ satisfies local calibration, then there exists $k \in [1/C,1]$ such that continuous $MECE$ is asymptotically upper bounded as:

Figures (4)

  • Figure 1: Our LoCal Nets (LCN) provide local calibration through feature reshaping (\ref{['fig:arch']}) Unlike post-hoc calibrators that rescale fixed logits, LCNs jointly (i) learn reduced feature representations $\phi'(\mathbf{x})$ and (ii) output new calibrated logits, aligning predictions with local class frequencies via Jensen–Shannon distance. (\ref{['fig:effect']}) On Cifar10 with resnet-50, LCNs yield tighter, better-separated class clusters and improved calibration ($\approx64\%$ reduction in MLCE, $~36\%$ reduction in LCE).
  • Figure 2: Empirical global calibration metrics (Q1) over five runs. The lower the better.
  • Figure 3: Empirical local calibration metrics (Q2) over five runs. The lower the better.
  • Figure 4: Points in the decision space (right) and their mapping to density-confidence space for calibration (left).

Theorems & Definitions (18)

  • Definition 1: Multiclass Local Calibration
  • Theorem 1: Continuous MECE under Local Calibration
  • Definition 2: General binning calibration metric
  • Theorem 2: Error decomposition of calibration metrics under Local Calibration
  • Definition 3: $\rho$-Perfect Local Calibration
  • Corollary 1: Calibration measure under $\rho$-Perfect Local Calibration
  • Theorem 3: Probabilistic bound for multiclass LCE under Local Calibration
  • Theorem 4: Asymptotic consistency of JSD
  • proof
  • proof
  • ...and 8 more