Table of Contents
Fetching ...

Accuracy-Preserving Calibration via Statistical Modeling on Probability Simplex

Yasushi Esaki, Akihiro Nakamura, Keisuke Kawano, Ryoko Tokuhisa, Takuro Kutsuna

TL;DR

The paper tackles the problem of calibrating neural network confidences without sacrificing predictive accuracy. It introduces Simplex Temperature Scaling (STS), which models predictions on the probability simplex with a two-parameter Concrete distribution, decoupling the classifier (location parameter) from calibration (temperature). A key theoretical result shows that a cross-entropy-trained DNN optimizes the simplex location parameter regardless of temperature, enabling accuracy preservation during calibration. STS uses Multi-Mixup to synthetically generate simplex-labeled samples for calibrating the temperature, reducing the overhead of ensemble methods. Empirical results across multiple image datasets show STS achieving superior calibration (lower ECE) and better out-of-distribution detection compared to Temperature Scaling baselines and Dirichlet-based approaches, highlighting its practical impact for reliable uncertainty estimation in safety-critical systems.

Abstract

Classification models based on deep neural networks (DNNs) must be calibrated to measure the reliability of predictions. Some recent calibration methods have employed a probabilistic model on the probability simplex. However, these calibration methods cannot preserve the accuracy of pre-trained models, even those with a high classification accuracy. We propose an accuracy-preserving calibration method using the Concrete distribution as the probabilistic model on the probability simplex. We theoretically prove that a DNN model trained on cross-entropy loss has optimality as the parameter of the Concrete distribution. We also propose an efficient method that synthetically generates samples for training probabilistic models on the probability simplex. We demonstrate that the proposed method can outperform previous methods in accuracy-preserving calibration tasks using benchmarks. The code is available at https://github.com/ToyotaCRDL/SimplexTS.

Accuracy-Preserving Calibration via Statistical Modeling on Probability Simplex

TL;DR

The paper tackles the problem of calibrating neural network confidences without sacrificing predictive accuracy. It introduces Simplex Temperature Scaling (STS), which models predictions on the probability simplex with a two-parameter Concrete distribution, decoupling the classifier (location parameter) from calibration (temperature). A key theoretical result shows that a cross-entropy-trained DNN optimizes the simplex location parameter regardless of temperature, enabling accuracy preservation during calibration. STS uses Multi-Mixup to synthetically generate simplex-labeled samples for calibrating the temperature, reducing the overhead of ensemble methods. Empirical results across multiple image datasets show STS achieving superior calibration (lower ECE) and better out-of-distribution detection compared to Temperature Scaling baselines and Dirichlet-based approaches, highlighting its practical impact for reliable uncertainty estimation in safety-critical systems.

Abstract

Classification models based on deep neural networks (DNNs) must be calibrated to measure the reliability of predictions. Some recent calibration methods have employed a probabilistic model on the probability simplex. However, these calibration methods cannot preserve the accuracy of pre-trained models, even those with a high classification accuracy. We propose an accuracy-preserving calibration method using the Concrete distribution as the probabilistic model on the probability simplex. We theoretically prove that a DNN model trained on cross-entropy loss has optimality as the parameter of the Concrete distribution. We also propose an efficient method that synthetically generates samples for training probabilistic models on the probability simplex. We demonstrate that the proposed method can outperform previous methods in accuracy-preserving calibration tasks using benchmarks. The code is available at https://github.com/ToyotaCRDL/SimplexTS.
Paper Structure (47 sections, 2 theorems, 25 equations, 16 figures, 7 tables, 1 algorithm)

This paper contains 47 sections, 2 theorems, 25 equations, 16 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

We assume that $p(y|\bm{\pi})$ and $p(\bm{\pi}|\bm{x})$ are formulated as Eqs. eq:ind and eq:con. Then, Eq. eq:predictive is given as follows.For confidence, another criterion is applied instead of Eq. eq:predictive. The confidence is explained in Section sec:confidence.

Figures (16)

  • Figure 1: Overview of our proposed method, Simplex Temperature Scaling (STS). The Concrete distribution has two parameters, which are computed using a given pre-trained DNN model and an additional branch.
  • Figure 2: Architecture of the additional branch for the temperature parameter $\lambda(\bm{x}, \bm{\theta}_{\mathrm{\lambda}})$ in the experiments. We used the same architecture regardless of the architecture of the pre-trained DNN model.
  • Figure 3: Transitions of Expected Calibration Errors (ECEs) when we varied the value of $\beta$ from 0.2 to 2.0 in 0.1 increments. Each point indicates the median of 5 trials and the shading indicates the standard deviation.
  • Figure 4: Dirichlet distribution.
  • Figure 5: Concrete distribution.
  • ...and 11 more figures

Theorems & Definitions (5)

  • Definition : Concrete distribution; concrete
  • Remark 1: Properties of the temperature parameter
  • Theorem 1: Predictive distribution
  • Corollary 1: Classification via the Concrete distribution
  • Remark 2: Interpretations of Corollary \ref{['prop1']}