Towards Certification of Uncertainty Calibration under Adversarial Attacks

Cornelius Emde; Francesco Pinto; Thomas Lukasiewicz; Philip H. S. Torr; Adel Bibi

Towards Certification of Uncertainty Calibration under Adversarial Attacks

Cornelius Emde, Francesco Pinto, Thomas Lukasiewicz, Philip H. S. Torr, Adel Bibi

TL;DR

This work tackles the problem of unreliable uncertainty estimates in neural classifiers under adversarial perturbations by introducing certified calibration. It develops a formal framework that bounds calibration error through two metrics: the certified Brier score (CBS) with a closed-form bound and the approximate certified calibration error (ACCE) via a mixed-integer program, facilitated by Gaussian smoothing certificates. The authors also introduce Adversarial Calibration Training (ACT) to actively improve calibrated uncertainty, proposing two variants (Brier-ACT and ACCE-ACT) that optimize losses under calibration adversaries using ADMM-based optimization. Empirical results across CIFAR-10, FashionMNIST, SVHN, CIFAR-100, and ImageNet show that ACT can reduce calibration error and Brier score at large certified radii, and that ACCE approximations via ADMM outperform baselines. Overall, the paper provides a practical pathway to certify and improve calibrated uncertainty in the presence of adversarial threats, with potential impact on safety-critical deployment where confidence reliability is as crucial as accuracy.

Abstract

Since neural classifiers are known to be sensitive to adversarial perturbations that alter their accuracy, \textit{certification methods} have been developed to provide provable guarantees on the insensitivity of their predictions to such perturbations. Furthermore, in safety-critical applications, the frequentist interpretation of the confidence of a classifier (also known as model calibration) can be of utmost importance. This property can be measured via the Brier score or the expected calibration error. We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations. Specifically, we produce analytic bounds for the Brier score and approximate bounds via the solution of a mixed-integer program on the expected calibration error. Finally, we propose novel calibration attacks and demonstrate how they can improve model calibration through \textit{adversarial calibration training}.

Towards Certification of Uncertainty Calibration under Adversarial Attacks

TL;DR

Abstract

Paper Structure (62 sections, 2 theorems, 32 equations, 19 figures, 10 tables, 2 algorithms)

This paper contains 62 sections, 2 theorems, 32 equations, 19 figures, 10 tables, 2 algorithms.

Introduction
Confidence Calibration
Quantifying the Confidence-Accuracy Mismatch
Calibration under Attack
Certifying Calibration
Certifying Brier Score
Certifying Calibration Error
CCE as Mixed-Integer Program
ADMM Solver
Adversarial Calibration Training
Experiments
Certified Brier Score
ACCE
Adversarial Calibration Training
Related Work
...and 47 more sections

Key Result

Theorem 3.3

Let $\mathbf{l}$, $\mathbf{u}$ be the bounds on $\mathbf{z}$, and $\mathbf{z}$ be the output of a certified classifier. Further, let $\mathbf{c} \in \mathbb{R}^N$ be the indicator that predictions are correct. The upper bound on the Brier score is given by: where the products are element-wise. Refer to Appendix appendix:brier_bound for the proof.

Figures (19)

Figure 1: This work proposes a certificate (upper bound) on the calibration error of a classifier under adversaries. Each box on the left represents one prediction from a certified model: The range of each box represents a certificate on the confidence score under adversaries, the color whether the prediction is certifiably correct. We translate these certificates into a certificate on the calibration error (right), i.e. a worst-case under attack.
Figure 2: Certified Brier scores on ImageNet. For small radii, small smoothing $\sigma$ outperforms larger ones, but as radii increase, large $\sigma$ outperform smaller $\sigma$.
Figure 3: The ACCE returned by ADMM, dECE, and the Brier confidences are shown here for ImageNet. ADMM is the most effective method, as it uniformly yields the largest bounds.
Figure 4: This figure shows the impact of fine-tuning a model via adversarial calibration training (ACT) on certified accuracy and approximate certified calibration error (ACCE). Each sub-figure presents an adversarial training baseline ("AT") at the origin, certified at the given radius. Following multiple fine-tunings using Brier-ACT and ACCE-ACT, changes in metrics depicted. The ideal corner indicated by $\star$. ACCE-ACT significantly improves certified calibration at larger radii.
Figure 5: This visualization shows that we can fix either the accuracy or the calibration and construct a dataset to obtain the other quantity with opposite values. The example looks at a binary classification problem. The empty circle displays the point of perfect calibration and the full circle is the calibration on the data. The distance of the line in-between is the calibration error.
...and 14 more figures

Theorems & Definitions (5)

Theorem 3.3
Definition 3.4
Theorem 3.5
proof
proof

Towards Certification of Uncertainty Calibration under Adversarial Attacks

TL;DR

Abstract

Towards Certification of Uncertainty Calibration under Adversarial Attacks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (5)