Table of Contents
Fetching ...

Improving Adversarial Training using Vulnerability-Aware Perturbation Budget

Olukorede Fakorede, Modeste Atsague, Jin Tian

TL;DR

This work identifies that fixed perturbation budgets in adversarial training fail to exploit sample-specific vulnerabilities. It introduces two vulnerability-aware radii strategies, Margin-Weighted Perturbation Budget (MWPB) and Standard-Deviation-Weighted Perturbation Budget (SDWPB), to adapt the inner maximization per sample using per-sample vulnerability scores derived from a margin or a standardized logit spread. The proposed methods integrate with standard AT, TRADES, and MART, show improved robustness against strong white-box and black-box attacks on CIFAR-10, SVHN, and Tiny ImageNet, while employing a two-phase training regime to manage the challenges of larger radii. Overall, vulnerability-aware perturbation budgets enhance AT effectiveness by better aligning perturbation strength with per-example vulnerability, with practical benefits for deploying robust models in diverse settings.

Abstract

Adversarial Training (AT) effectively improves the robustness of Deep Neural Networks (DNNs) to adversarial attacks. Generally, AT involves training DNN models with adversarial examples obtained within a pre-defined, fixed perturbation bound. Notably, individual natural examples from which these adversarial examples are crafted exhibit varying degrees of intrinsic vulnerabilities, and as such, crafting adversarial examples with fixed perturbation radius for all instances may not sufficiently unleash the potency of AT. Motivated by this observation, we propose two simple, computationally cheap vulnerability-aware reweighting functions for assigning perturbation bounds to adversarial examples used for AT, named Margin-Weighted Perturbation Budget (MWPB) and Standard-Deviation-Weighted Perturbation Budget (SDWPB). The proposed methods assign perturbation radii to individual adversarial samples based on the vulnerability of their corresponding natural examples. Experimental results show that the proposed methods yield genuine improvements in the robustness of AT algorithms against various adversarial attacks.

Improving Adversarial Training using Vulnerability-Aware Perturbation Budget

TL;DR

This work identifies that fixed perturbation budgets in adversarial training fail to exploit sample-specific vulnerabilities. It introduces two vulnerability-aware radii strategies, Margin-Weighted Perturbation Budget (MWPB) and Standard-Deviation-Weighted Perturbation Budget (SDWPB), to adapt the inner maximization per sample using per-sample vulnerability scores derived from a margin or a standardized logit spread. The proposed methods integrate with standard AT, TRADES, and MART, show improved robustness against strong white-box and black-box attacks on CIFAR-10, SVHN, and Tiny ImageNet, while employing a two-phase training regime to manage the challenges of larger radii. Overall, vulnerability-aware perturbation budgets enhance AT effectiveness by better aligning perturbation strength with per-example vulnerability, with practical benefits for deploying robust models in diverse settings.

Abstract

Adversarial Training (AT) effectively improves the robustness of Deep Neural Networks (DNNs) to adversarial attacks. Generally, AT involves training DNN models with adversarial examples obtained within a pre-defined, fixed perturbation bound. Notably, individual natural examples from which these adversarial examples are crafted exhibit varying degrees of intrinsic vulnerabilities, and as such, crafting adversarial examples with fixed perturbation radius for all instances may not sufficiently unleash the potency of AT. Motivated by this observation, we propose two simple, computationally cheap vulnerability-aware reweighting functions for assigning perturbation bounds to adversarial examples used for AT, named Margin-Weighted Perturbation Budget (MWPB) and Standard-Deviation-Weighted Perturbation Budget (SDWPB). The proposed methods assign perturbation radii to individual adversarial samples based on the vulnerability of their corresponding natural examples. Experimental results show that the proposed methods yield genuine improvements in the robustness of AT algorithms against various adversarial attacks.
Paper Structure (34 sections, 4 theorems, 12 equations, 2 figures, 13 tables, 2 algorithms)

This paper contains 34 sections, 4 theorems, 12 equations, 2 figures, 13 tables, 2 algorithms.

Key Result

Theorem 1

Let $\mathcal{L}$ and $f_{\theta}(.)$ denote the cross-entropy loss function and the predictions of the model respectively. Consider two natural input-label pairs $(x_1, y_1)$ and $(x_2, y_2)$ such that $\mathcal{L}(x_1, y_1) > \mathcal{L}(x_2, y_2)$. The following holds for first-order adversarial

Figures (2)

  • Figure 1: Plots showing the distribution of perturbation radii for MWPB-AT, MWPB-TRADES and MWPB-MART respectively.
  • Figure 2: Plots showing the distribution of perturbation radii for SDWPB-AT, SDWPB-TRADES and SDWPB-MART respectively.

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Lemma 1: pang2020boosting
  • Lemma 2: katharopoulos2017biased
  • Proof 1
  • Proof 2