Table of Contents
Fetching ...

Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors

Linwei Tao, Haolan Guo, Minjing Dong, Chang Xu

TL;DR

This paper proposes a post-hoc calibration method called Consistency Calibration (CC), which adjusts confidence based on the model's consistency across perturbed inputs, and shows that performing perturbations at the logit level significantly improves computational efficiency.

Abstract

Calibration is crucial in deep learning applications, especially in fields like healthcare and autonomous driving, where accurate confidence estimates are vital for decision-making. However, deep neural networks often suffer from miscalibration, with reliability diagrams and Expected Calibration Error (ECE) being the only standard perspective for evaluating calibration performance. In this paper, we introduce the concept of consistency as an alternative perspective on model calibration, inspired by uncertainty estimation literature in large language models (LLMs). We highlight its advantages over the traditional reliability-based view. Building on this concept, we propose a post-hoc calibration method called Consistency Calibration (CC), which adjusts confidence based on the model's consistency across perturbed inputs. CC is particularly effective in locally uncertainty estimation, as it requires no additional data samples or label information, instead generating input perturbations directly from the source data. Moreover, we show that performing perturbations at the logit level significantly improves computational efficiency. We validate the effectiveness of CC through extensive comparisons with various post-hoc and training-time calibration methods, demonstrating state-of-the-art performance on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet, as well as on long-tailed datasets like ImageNet-LT.

Consistency Calibration: Improving Uncertainty Calibration via Consistency among Perturbed Neighbors

TL;DR

This paper proposes a post-hoc calibration method called Consistency Calibration (CC), which adjusts confidence based on the model's consistency across perturbed inputs, and shows that performing perturbations at the logit level significantly improves computational efficiency.

Abstract

Calibration is crucial in deep learning applications, especially in fields like healthcare and autonomous driving, where accurate confidence estimates are vital for decision-making. However, deep neural networks often suffer from miscalibration, with reliability diagrams and Expected Calibration Error (ECE) being the only standard perspective for evaluating calibration performance. In this paper, we introduce the concept of consistency as an alternative perspective on model calibration, inspired by uncertainty estimation literature in large language models (LLMs). We highlight its advantages over the traditional reliability-based view. Building on this concept, we propose a post-hoc calibration method called Consistency Calibration (CC), which adjusts confidence based on the model's consistency across perturbed inputs. CC is particularly effective in locally uncertainty estimation, as it requires no additional data samples or label information, instead generating input perturbations directly from the source data. Moreover, we show that performing perturbations at the logit level significantly improves computational efficiency. We validate the effectiveness of CC through extensive comparisons with various post-hoc and training-time calibration methods, demonstrating state-of-the-art performance on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet, as well as on long-tailed datasets like ImageNet-LT.

Paper Structure

This paper contains 33 sections, 2 theorems, 12 equations, 8 figures, 13 tables.

Key Result

Proposition 1

If a model is confident in its prediction, it should consistently output the same prediction when the input is slightly perturbed. The consistency $c$ of a sample $x$ is defined as where $T$ is the number of perturbed neighbors, $\hat{y}(\Tilde{x}_t)$ is the predicted label for the perturbed input $\Tilde{x}_t$, and the distance between the original sample $x$ and its perturbed version $\Tilde{x}

Figures (8)

  • Figure 1: Illustrations of Consistency, Toy Data Distributions, and Ground Truth Uncertainty.
  • Figure 2: Comparison of Consistency vs. Reliability in Estimating Ground Truth Uncertainty
  • Figure 3: Evaluation of Consistency Calibration under Different Perturbation Settings.
  • Figure 4: Distribution of the max logit and second-largest logit for correct and incorrect predictions with more than 99% confidence, representing well-calibrated and miscalibrated samples on ResNet-50 across different datasets. The difference between the max logit and second-largest logit is significantly smaller for miscalibrated samples compared to well-calibrated samples.
  • Figure 5: Calibration performance of ResNet-50 on ImageNet-1K using AdaECE$\downarrow$, CECE$\downarrow$, NLL$\downarrow$, and Accuracy$\uparrow$. ECE, AdaECE, and CECE are reported with 15 bins. Colors in the legend represent different methods. Results are averaged over 5 runs.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2