Table of Contents
Fetching ...

Stability and Generalization in Free Adversarial Training

Xiwei Cheng, Kexin Fu, Farzan Farnia

TL;DR

The paper analyzes how min-max optimization strategies in adversarial training affect generalization, comparing vanilla AT with Free AT through the lens of algorithmic stability. It provides nonconvex-nonconcave bounds showing Free AT can achieve a smaller generalization gap ${\mathcal{E}_\text{gen}}$ than vanilla AT by updating weights and perturbations simultaneously, and validates this with extensive experiments across multiple datasets and attacks. A variant, Free--TRADES, is proposed and shown to further reduce the generalization gap while preserving robustness. The results highlight stability-driven insights into robust generalization and suggest practical benefits of simultaneous min-max updates for adversarial learning. Overall, the work links algorithmic stability to improved generalization in adversarial settings and offers a path to more data-efficient, robust models.

Abstract

While adversarial training methods have significantly improved the robustness of deep neural networks against norm-bounded adversarial perturbations, the generalization gap between their performance on training and test data is considerably greater than that of standard empirical risk minimization. Recent studies have aimed to connect the generalization properties of adversarially trained classifiers to the min-max optimization algorithm used in their training. In this work, we analyze the interconnections between generalization and optimization in adversarial training using the algorithmic stability framework. Specifically, our goal is to compare the generalization gap of neural networks trained using the vanilla adversarial training method, which fully optimizes perturbations at every iteration, with the free adversarial training method, which simultaneously optimizes norm-bounded perturbations and classifier parameters. We prove bounds on the generalization error of these methods, indicating that the free adversarial training method may exhibit a lower generalization gap between training and test samples due to its simultaneous min-max optimization of classifier weights and perturbation variables. We conduct several numerical experiments to evaluate the train-to-test generalization gap in vanilla and free adversarial training methods. Our empirical findings also suggest that the free adversarial training method could lead to a smaller generalization gap over a similar number of training iterations. The paper code is available at https://github.com/Xiwei-Cheng/Stability_FreeAT.

Stability and Generalization in Free Adversarial Training

TL;DR

The paper analyzes how min-max optimization strategies in adversarial training affect generalization, comparing vanilla AT with Free AT through the lens of algorithmic stability. It provides nonconvex-nonconcave bounds showing Free AT can achieve a smaller generalization gap than vanilla AT by updating weights and perturbations simultaneously, and validates this with extensive experiments across multiple datasets and attacks. A variant, Free--TRADES, is proposed and shown to further reduce the generalization gap while preserving robustness. The results highlight stability-driven insights into robust generalization and suggest practical benefits of simultaneous min-max updates for adversarial learning. Overall, the work links algorithmic stability to improved generalization in adversarial settings and offers a path to more data-efficient, robust models.

Abstract

While adversarial training methods have significantly improved the robustness of deep neural networks against norm-bounded adversarial perturbations, the generalization gap between their performance on training and test data is considerably greater than that of standard empirical risk minimization. Recent studies have aimed to connect the generalization properties of adversarially trained classifiers to the min-max optimization algorithm used in their training. In this work, we analyze the interconnections between generalization and optimization in adversarial training using the algorithmic stability framework. Specifically, our goal is to compare the generalization gap of neural networks trained using the vanilla adversarial training method, which fully optimizes perturbations at every iteration, with the free adversarial training method, which simultaneously optimizes norm-bounded perturbations and classifier parameters. We prove bounds on the generalization error of these methods, indicating that the free adversarial training method may exhibit a lower generalization gap between training and test samples due to its simultaneous min-max optimization of classifier weights and perturbation variables. We conduct several numerical experiments to evaluate the train-to-test generalization gap in vanilla and free adversarial training methods. Our empirical findings also suggest that the free adversarial training method could lead to a smaller generalization gap over a similar number of training iterations. The paper code is available at https://github.com/Xiwei-Cheng/Stability_FreeAT.
Paper Structure (20 sections, 12 theorems, 60 equations, 12 figures, 6 tables, 4 algorithms)

This paper contains 20 sections, 12 theorems, 60 equations, 12 figures, 6 tables, 4 algorithms.

Key Result

Theorem 1

Assume that a randomized algorithm ${A}$ is $\epsilon$-uniformly stable, then the expected generalization risk satisfies

Figures (12)

  • Figure 1: Learning curves of different algorithms for a ResNet18 model adversarially trained against ${\mathcal{L}_2}$ and ${\mathcal{L}_\infty}$ attacks on CIFAR-10. The free curves are scaled horizontally by a factor of $m$.
  • Figure 2: Robust accuracy of ResNet18 models adversarially trained by vanilla, fast, and free algorithms against square attack on CIFAR10. The left figure applies ${\mathcal{L}_2}$ attacks of radius ranging from 64 to 192, and the right figure applies ${\mathcal{L}_\infty}$ attacks of radius ranging from 1 to 9.
  • Figure 3: Robust accuracy against transferred attacks designed for another independently trained robust model. The left figure applies ${\mathcal{L}_2}$ attacks of radius ranging from 64 to 192, and the right figure applies ${\mathcal{L}_\infty}$ attacks of radius ranging from 1 to 9.
  • Figure 4: Adversarial generalization gap of ResNet18 models adversarially trained by vanilla, fast, and free algorithm for a fixed number of steps on a subset of CIFAR-10.
  • Figure 5: Learning curves of different algorithms for a ResNet18 model adversarially trained against ${\mathcal{L}_2}$-norm and ${\mathcal{L}_\infty}$-norm attacks on CIFAR-10 and CIFAR-100. The free curves are scaled horizontally by a factor of $m$ for clear comparison.
  • ...and 7 more figures

Theorems & Definitions (25)

  • Definition 1
  • Theorem 1
  • proof
  • Theorem 2: Stability generalization bound of ${A_\text{Vanilla}}$
  • Theorem 3: Lower bound on stability; Theorem 1 in xing2021algorithmic, Theorem 5.2 in xiao2022stability
  • Theorem 4: Stability generalization bound of ${A_\text{Free}}$
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • ...and 15 more