Table of Contents
Fetching ...

Probability Bounding: Post-Hoc Calibration via Box-Constrained Softmax

Kyohei Atarashi, Satoshi Oyama, Hiromi Arai, Hisashi Kashima

TL;DR

This work proposes probability bounding (PB), a novel post-hoc calibration method that mitigates both underconfidence and overconfidence by learning lower and upper bounds on the output probabilities.

Abstract

Many studies have observed that modern neural networks achieve high accuracy while producing poorly calibrated probabilities, making calibration a critical practical issue. In this work, we propose probability bounding (PB), a novel post-hoc calibration method that mitigates both underconfidence and overconfidence by learning lower and upper bounds on the output probabilities. To implement PB, we introduce the box-constrained softmax (BCSoftmax) function, a generalization of Softmax that explicitly enforces lower and upper bounds on the output probabilities. While BCSoftmax is formulated as the solution to a box-constrained optimization problem, we develop an exact and efficient algorithm for computing BCSoftmax. We further provide theoretical guarantees for PB and introduce two variants of PB. We demonstrate the effectiveness of our methods experimentally on four real-world datasets, consistently reducing calibration errors. Our Python implementation is available at https://github.com/neonnnnn/torchbcsoftmax.

Probability Bounding: Post-Hoc Calibration via Box-Constrained Softmax

TL;DR

This work proposes probability bounding (PB), a novel post-hoc calibration method that mitigates both underconfidence and overconfidence by learning lower and upper bounds on the output probabilities.

Abstract

Many studies have observed that modern neural networks achieve high accuracy while producing poorly calibrated probabilities, making calibration a critical practical issue. In this work, we propose probability bounding (PB), a novel post-hoc calibration method that mitigates both underconfidence and overconfidence by learning lower and upper bounds on the output probabilities. To implement PB, we introduce the box-constrained softmax (BCSoftmax) function, a generalization of Softmax that explicitly enforces lower and upper bounds on the output probabilities. While BCSoftmax is formulated as the solution to a box-constrained optimization problem, we develop an exact and efficient algorithm for computing BCSoftmax. We further provide theoretical guarantees for PB and introduce two variants of PB. We demonstrate the effectiveness of our methods experimentally on four real-world datasets, consistently reducing calibration errors. Our Python implementation is available at https://github.com/neonnnnn/torchbcsoftmax.

Paper Structure

This paper contains 75 sections, 174 equations, 4 figures, 6 tables, 6 algorithms.

Figures (4)

  • Figure 1: Probability calibration via box constraint. The left figure shows the predicted probabilities $\bm{p}$ and the target probabilities. The right figure shows the predicted probabilities with the box constraint $0.04 \le p_i \le 0.75$. Enforcing this box constraint mitigates the overconfidence of the top-label (argmax) prediction and the underconfidence of the other predictions. This mechanism directly enhances the reliability and trustworthiness of the prediction.
  • Figure 2: Comparison of the $\mathop{\mathrm{Softmax}}\nolimits$ probabilities (blue point) with the $\mathop{\mathrm{BCSoftmax}}\nolimits$ probabilities (red star point) for the logit vector $\bm{g}=(-1.5, 1, -0.5)^\top$, lower bound vector $\bm{a}=(0.05, 0.1, 0.0)^\top$, and upper bound vector $\bm{b} = (1.0, 0.6, 0.5)^\top$ with $\tau=1$. The blue region represents the three-dimensional probability simplex $\Delta^3$ and the red region is the box-constrained $\Delta^3$ induced by $\bm{a}$ and $\bm{b}$, that is, $\Delta^3 \cap [\bm{a}, \bm{b}]$. Due to the upper bound constraint $b_2 = 0.6$, the $\mathop{\mathrm{BCSoftmax}}\nolimits$ probabilities are pushed into the red region, compared to the $\mathop{\mathrm{Softmax}}\nolimits$ probabilities.
  • Figure 3: An illustration of the uniform underconfidence (overconfidence) on a low- (high-) probability region property.
  • Figure 4: Runtime comparison of the proposed exact algorithm with the existing method. The proposed method is 150-400$\times$ faster than the existing method.

Theorems & Definitions (23)

  • Remark 3.1
  • Remark 3.2
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 13 more