Table of Contents
Fetching ...

Standard-Deviation-Inspired Regularization for Improving Adversarial Robustness

Olukorede Fakorede, Modeste Atsague, Jin Tian

TL;DR

This work introduces a standard-deviation-inspired (SDI) regularization term to enhance adversarial robustness and generalization in existing AT frameworks. The SDI measure M_{SDI} captures the dispersion of a model’s output probabilities around the true class probability for each example, and is selectively regularized via L_{SDI} to maximize the true-class probability gap. By applying SDI as AT-SDI and TRADES-SDI, the authors demonstrate improved robustness against strong attacks such as CW and AutoAttack across multiple datasets and backbones, with only modest computational overhead. They also show that SDI can be used to craft adversarial examples and that SDI-regularization reduces the performance gap between PGD-based robustness and other attacks, indicating better generalization. The findings suggest SDI regularization provides a complementary, attack-agnostic signal to AT and TRADES, broadening robustness without severe costs or gradient obfuscation.

Abstract

Adversarial Training (AT) has been demonstrated to improve the robustness of deep neural networks (DNNs) against adversarial attacks. AT is a min-max optimization procedure where in adversarial examples are generated to train a more robust DNN. The inner maximization step of AT increases the losses of inputs with respect to their actual classes. The outer minimization involves minimizing the losses on the adversarial examples obtained from the inner maximization. This work proposes a standard-deviation-inspired (SDI) regularization term to improve adversarial robustness and generalization. We argue that the inner maximization in AT is similar to minimizing a modified standard deviation of the model's output probabilities. Moreover, we suggest that maximizing this modified standard deviation can complement the outer minimization of the AT framework. To support our argument, we experimentally show that the SDI measure can be used to craft adversarial examples. Additionally, we demonstrate that combining the SDI regularization term with existing AT variants enhances the robustness of DNNs against stronger attacks, such as CW and Auto-attack, and improves generalization.

Standard-Deviation-Inspired Regularization for Improving Adversarial Robustness

TL;DR

This work introduces a standard-deviation-inspired (SDI) regularization term to enhance adversarial robustness and generalization in existing AT frameworks. The SDI measure M_{SDI} captures the dispersion of a model’s output probabilities around the true class probability for each example, and is selectively regularized via L_{SDI} to maximize the true-class probability gap. By applying SDI as AT-SDI and TRADES-SDI, the authors demonstrate improved robustness against strong attacks such as CW and AutoAttack across multiple datasets and backbones, with only modest computational overhead. They also show that SDI can be used to craft adversarial examples and that SDI-regularization reduces the performance gap between PGD-based robustness and other attacks, indicating better generalization. The findings suggest SDI regularization provides a complementary, attack-agnostic signal to AT and TRADES, broadening robustness without severe costs or gradient obfuscation.

Abstract

Adversarial Training (AT) has been demonstrated to improve the robustness of deep neural networks (DNNs) against adversarial attacks. AT is a min-max optimization procedure where in adversarial examples are generated to train a more robust DNN. The inner maximization step of AT increases the losses of inputs with respect to their actual classes. The outer minimization involves minimizing the losses on the adversarial examples obtained from the inner maximization. This work proposes a standard-deviation-inspired (SDI) regularization term to improve adversarial robustness and generalization. We argue that the inner maximization in AT is similar to minimizing a modified standard deviation of the model's output probabilities. Moreover, we suggest that maximizing this modified standard deviation can complement the outer minimization of the AT framework. To support our argument, we experimentally show that the SDI measure can be used to craft adversarial examples. Additionally, we demonstrate that combining the SDI regularization term with existing AT variants enhances the robustness of DNNs against stronger attacks, such as CW and Auto-attack, and improves generalization.
Paper Structure (23 sections, 9 equations, 1 figure, 11 tables, 2 algorithms)

This paper contains 23 sections, 9 equations, 1 figure, 11 tables, 2 algorithms.

Figures (1)

  • Figure 1: Comparison of natural CIFAR-10 images with the adversarial perturbations and adversarial examples obtained by SDI-PGD attack defined in Eq. (\ref{['eq:SDI-PGD_attack']}). Images in the first row represent natural CIFAR-10 images and their correct labels. The second and third rows represent the corresponding $l_{\infty}$ adversarial perturbations, with $\epsilon$ = 8/255 and 0.2 respectively. The fourth row represents the corresponding adversarial examples and their incorrect labels for each image in row one.