Improving Adversarial Training using Vulnerability-Aware Perturbation Budget
Olukorede Fakorede, Modeste Atsague, Jin Tian
TL;DR
This work identifies that fixed perturbation budgets in adversarial training fail to exploit sample-specific vulnerabilities. It introduces two vulnerability-aware radii strategies, Margin-Weighted Perturbation Budget (MWPB) and Standard-Deviation-Weighted Perturbation Budget (SDWPB), to adapt the inner maximization per sample using per-sample vulnerability scores derived from a margin or a standardized logit spread. The proposed methods integrate with standard AT, TRADES, and MART, show improved robustness against strong white-box and black-box attacks on CIFAR-10, SVHN, and Tiny ImageNet, while employing a two-phase training regime to manage the challenges of larger radii. Overall, vulnerability-aware perturbation budgets enhance AT effectiveness by better aligning perturbation strength with per-example vulnerability, with practical benefits for deploying robust models in diverse settings.
Abstract
Adversarial Training (AT) effectively improves the robustness of Deep Neural Networks (DNNs) to adversarial attacks. Generally, AT involves training DNN models with adversarial examples obtained within a pre-defined, fixed perturbation bound. Notably, individual natural examples from which these adversarial examples are crafted exhibit varying degrees of intrinsic vulnerabilities, and as such, crafting adversarial examples with fixed perturbation radius for all instances may not sufficiently unleash the potency of AT. Motivated by this observation, we propose two simple, computationally cheap vulnerability-aware reweighting functions for assigning perturbation bounds to adversarial examples used for AT, named Margin-Weighted Perturbation Budget (MWPB) and Standard-Deviation-Weighted Perturbation Budget (SDWPB). The proposed methods assign perturbation radii to individual adversarial samples based on the vulnerability of their corresponding natural examples. Experimental results show that the proposed methods yield genuine improvements in the robustness of AT algorithms against various adversarial attacks.
