Table of Contents
Fetching ...

Probabilistic Robustness for Free? Revisiting Training via a Benchmark

Yi Zhang, Zheng Wang, Zhen Chen, Wenjie Ruan, Qing Guo, Siddartha Khastgir, Carsten Maple, Xingyu Zhao

TL;DR

This work introduces PRBench, the first benchmark dedicated to probabilistic robustness (PR) in deep learning, contrasting it with traditional adversarial robustness (AR). PRBench evaluates training methods across a large, diverse set of models (222) and datasets (7), using AT, PR-targeted RT methods, and a hybrid AT-PR approach, with metrics covering AR, PR, generalization error (GE), clean accuracy, and training efficiency. The empirical results show that adversarial training (AT) methods typically improve PR as a by-product of AR optimization, often with little extra cost, while PR-targeted methods tend to yield lower GE and higher clean accuracy but at efficiency or AR trade-offs; the hybrid AT-PR method offers a balanced but more costly approach. The authors also provide a theoretical framework (Uniform Stability) linking GE to training choices and demonstrate that RT-based methods can lead to smoother optimization and reduced GE, supporting observed empirical trends. A public leaderboard and open-source materials accompany PRBench, enabling ongoing, evidence-based comparisons and advancing the practical study of PR in real-world, safety-critical applications.

Abstract

Deep learning models are notoriously vulnerable to imperceptible perturbations. Most existing research centers on adversarial robustness (AR), which evaluates models under worst-case scenarios by examining the existence of deterministic adversarial examples (AEs). In contrast, probabilistic robustness (PR) adopts a statistical perspective, measuring the probability that predictions remain correct under stochastic perturbations. While PR is widely regarded as a practical complement to AR, dedicated training methods for improving PR are still relatively underexplored, albeit with emerging progress. Among the few PR-targeted training methods, we identify three limitations: i non-comparable evaluation protocols; ii limited comparisons to strong AT baselines despite anecdotal PR gains from AT; and iii no unified framework to compare the generalization of these methods. Thus, we introduce PRBench, the first benchmark dedicated to evaluating improvements in PR achieved by different robustness training methods. PRBench empirically compares most common AT and PR-targeted training methods using a comprehensive set of metrics, including clean accuracy, PR and AR performance, training efficiency, and generalization error (GE). We also provide theoretical analysis on the GE of PR performance across different training methods. Main findings revealed by PRBench include: AT methods are more versatile than PR-targeted training methods in terms of improving both AR and PR performance across diverse hyperparameter settings, while PR-targeted training methods consistently yield lower GE and higher clean accuracy. A leaderboard comprising 222 trained models across 7 datasets and 10 model architectures is publicly available at https://tmpspace.github.io/PRBenchLeaderboard/.

Probabilistic Robustness for Free? Revisiting Training via a Benchmark

TL;DR

This work introduces PRBench, the first benchmark dedicated to probabilistic robustness (PR) in deep learning, contrasting it with traditional adversarial robustness (AR). PRBench evaluates training methods across a large, diverse set of models (222) and datasets (7), using AT, PR-targeted RT methods, and a hybrid AT-PR approach, with metrics covering AR, PR, generalization error (GE), clean accuracy, and training efficiency. The empirical results show that adversarial training (AT) methods typically improve PR as a by-product of AR optimization, often with little extra cost, while PR-targeted methods tend to yield lower GE and higher clean accuracy but at efficiency or AR trade-offs; the hybrid AT-PR method offers a balanced but more costly approach. The authors also provide a theoretical framework (Uniform Stability) linking GE to training choices and demonstrate that RT-based methods can lead to smoother optimization and reduced GE, supporting observed empirical trends. A public leaderboard and open-source materials accompany PRBench, enabling ongoing, evidence-based comparisons and advancing the practical study of PR in real-world, safety-critical applications.

Abstract

Deep learning models are notoriously vulnerable to imperceptible perturbations. Most existing research centers on adversarial robustness (AR), which evaluates models under worst-case scenarios by examining the existence of deterministic adversarial examples (AEs). In contrast, probabilistic robustness (PR) adopts a statistical perspective, measuring the probability that predictions remain correct under stochastic perturbations. While PR is widely regarded as a practical complement to AR, dedicated training methods for improving PR are still relatively underexplored, albeit with emerging progress. Among the few PR-targeted training methods, we identify three limitations: i non-comparable evaluation protocols; ii limited comparisons to strong AT baselines despite anecdotal PR gains from AT; and iii no unified framework to compare the generalization of these methods. Thus, we introduce PRBench, the first benchmark dedicated to evaluating improvements in PR achieved by different robustness training methods. PRBench empirically compares most common AT and PR-targeted training methods using a comprehensive set of metrics, including clean accuracy, PR and AR performance, training efficiency, and generalization error (GE). We also provide theoretical analysis on the GE of PR performance across different training methods. Main findings revealed by PRBench include: AT methods are more versatile than PR-targeted training methods in terms of improving both AR and PR performance across diverse hyperparameter settings, while PR-targeted training methods consistently yield lower GE and higher clean accuracy. A leaderboard comprising 222 trained models across 7 datasets and 10 model architectures is publicly available at https://tmpspace.github.io/PRBenchLeaderboard/.

Paper Structure

This paper contains 31 sections, 12 theorems, 83 equations, 6 figures, 11 tables, 2 algorithms.

Key Result

Theorem 1

Given the Lipschitz and smoothness assumption in Assumption amp_on_model for classifier $f$, we show that the surrogate loss $\max_{\Vert \bm{\delta} \Vert \leq \gamma}\mathcal{L}_{\textit{CE}}(f(\mathbf{p}(\bm{x} + \bm{\delta}, \bm{\theta})), y)$ is $\varphi$-Lipschitz and $\phi$-approximate $\psi$ We run SGD with learning rate $\alpha_t \leq c/t$ for $T$ steps with a constant $c$ such that $1/c

Figures (6)

  • Figure 1: Comparison of Adversarial (a) and Probabilistic Robustness (b)
  • Figure 2: (a) Comparison of training methods (AT and RT) in terms of AR (AA) and PR ($\textit{PR}_{\mathcal{D}}^{\text{Uniform}}(\gamma)$) performance across various datasets. (b) Composite robustness scores of different training methods, aggregated over all test datasets and model architectures. (c) $\textit{PR}_{\mathcal{D}}^{\text{Uniform}}(\gamma)$ of ResNet-18 trained with different training methods on CIFAR-10 under varying $\gamma$. (d) $\textit{PR}_{\mathcal{D}}^{\text{Laplace}}(\gamma)$ for ResNet-18 trained with corruption training and PGD models on CIFAR-10 across various $\gamma$. (e) $\textit{ProbAcc}(\rho,\gamma=0.03)$ for ResNet-18 trained with different training methods on CIFAR-10 with respect to different robustness tolerance level $\rho$. (f) GE of $\textit{PR}_{\mathcal{D}}^{\text{Uniform}}(\gamma)$ for ResNet-18 trained with different training methods on CIFAR-10 with respect to different $\gamma$. More experimental results are deferred to Appendix \ref{['appendix: additional_experiments']}.
  • Figure 3: $\textit{PR}_{\mathcal{D}}^{\text{Uniform}}(\gamma)$ for different models (ResNet-18, ResNet-34, WRN-28-10, VGG-19 and SimpleCNN) trained with various training methods both AT and PR-targeted on different datasets (CIFAR-10, CIFAR-100, CINIC-10, SVHN, MNIST, TinyImageNet, ImageNet-50), evaluated under varying perturbation radii $\gamma$.
  • Figure 4: $\textit{ProbAcc}(\rho, \gamma=0.03)$ for different models (ResNet-18, ResNet-34, WRN-28-10, VGG-19) trained with various training methods both AT and PR-targeted on different datasets (CIFAR-10, CIFAR-100, CINIC-10, SVHN, MNIST, TinyImageNet, ImageNet-50), evaluated under varying robustness tolerance level $\rho$.
  • Figure 5: GE of $\textit{PR}_{\mathcal{D}}^{\text{Uniform}}(\gamma)$ for different models (ResNet-18, ResNet-34, WRN-28-10, VGG-19 and SimpleCNN) trained with training methods both AT and PR-targeted on different datasets (CIFAR-10, CIFAR-100, CINIC-10, SVHN, MNIST, TinyImageNet, ImageNet-50), evaluated under varying perturbation radii $\gamma$.
  • ...and 1 more figures

Theorems & Definitions (25)

  • Definition 1: Probabilistic Robustness
  • Definition 2: Risk-based Training (RT)
  • Definition 3: Approximate Smoothness xiao2022stability
  • Theorem 1
  • Proposition 1
  • Theorem 2
  • Remark 1
  • Lemma 1
  • Proof 1
  • Lemma 2: Gradient of the Cross-Entropy Loss
  • ...and 15 more