Table of Contents
Fetching ...

Conserve-Update-Revise to Cure Generalization and Robustness Trade-off in Adversarial Training

Shruthi Gowda, Bahram Zonooz, Elahe Arani

TL;DR

Adversarial training improves robustness but hurts standard generalization, creating a robustness-generalization gap often worsened by robust overfitting. The authors analyze layer-wise learning dynamics during the transition from standard to adversarial training and find that selective conservation and updating of layers, guided by gradient prominence, can improve learning efficiency. They propose CURE, a Conserve-Update-Revise framework that uses a gradient-based gate to conserve useful natural-data knowledge, update layers that handle adversarial data, and revise consolidated knowledge through a revision model. Across CIFAR-10/100 and SVHN on multiple architectures and attacks, CURE achieves superior trade-offs between natural and robust accuracy, reduces robust overfitting, and shows robustness to natural corruptions, underscoring the value of selective training schemes for generalization and security.

Abstract

Adversarial training improves the robustness of neural networks against adversarial attacks, albeit at the expense of the trade-off between standard and robust generalization. To unveil the underlying factors driving this phenomenon, we examine the layer-wise learning capabilities of neural networks during the transition from a standard to an adversarial setting. Our empirical findings demonstrate that selectively updating specific layers while preserving others can substantially enhance the network's learning capacity. We therefore propose CURE, a novel training framework that leverages a gradient prominence criterion to perform selective conservation, updating, and revision of weights. Importantly, CURE is designed to be dataset- and architecture-agnostic, ensuring its applicability across various scenarios. It effectively tackles both memorization and overfitting issues, thus enhancing the trade-off between robustness and generalization and additionally, this training approach also aids in mitigating "robust overfitting". Furthermore, our study provides valuable insights into the mechanisms of selective adversarial training and offers a promising avenue for future research.

Conserve-Update-Revise to Cure Generalization and Robustness Trade-off in Adversarial Training

TL;DR

Adversarial training improves robustness but hurts standard generalization, creating a robustness-generalization gap often worsened by robust overfitting. The authors analyze layer-wise learning dynamics during the transition from standard to adversarial training and find that selective conservation and updating of layers, guided by gradient prominence, can improve learning efficiency. They propose CURE, a Conserve-Update-Revise framework that uses a gradient-based gate to conserve useful natural-data knowledge, update layers that handle adversarial data, and revise consolidated knowledge through a revision model. Across CIFAR-10/100 and SVHN on multiple architectures and attacks, CURE achieves superior trade-offs between natural and robust accuracy, reduces robust overfitting, and shows robustness to natural corruptions, underscoring the value of selective training schemes for generalization and security.

Abstract

Adversarial training improves the robustness of neural networks against adversarial attacks, albeit at the expense of the trade-off between standard and robust generalization. To unveil the underlying factors driving this phenomenon, we examine the layer-wise learning capabilities of neural networks during the transition from a standard to an adversarial setting. Our empirical findings demonstrate that selectively updating specific layers while preserving others can substantially enhance the network's learning capacity. We therefore propose CURE, a novel training framework that leverages a gradient prominence criterion to perform selective conservation, updating, and revision of weights. Importantly, CURE is designed to be dataset- and architecture-agnostic, ensuring its applicability across various scenarios. It effectively tackles both memorization and overfitting issues, thus enhancing the trade-off between robustness and generalization and additionally, this training approach also aids in mitigating "robust overfitting". Furthermore, our study provides valuable insights into the mechanisms of selective adversarial training and offers a promising avenue for future research.
Paper Structure (28 sections, 10 equations, 12 figures, 10 tables, 1 algorithm)

This paper contains 28 sections, 10 equations, 12 figures, 10 tables, 1 algorithm.

Figures (12)

  • Figure 1: Generalization robustness trade-off on WideResNet-34-10 and CIFAR-10. CURE displays a better trade-off between standard and robust (C&W) performance.
  • Figure 2: (a) The four blocks of ResNet-18 are considered for the layer-wise study. All layers are trained with natural images (ST) first. The second row shows the example architecture for U-23, where the first and last blocks are frozen and the second and third are updated while training adversarially. (b) Standard generalization and robustness of different blocks of ResNet-18 on CIFAR-10 dataset
  • Figure 3: Representation similarity between robust and nob-robust features in ResNet-18 trained on the CIFAR-10 dataset.
  • Figure 4: Adversarial test accuracy during training after layer-wise updation and conservation of ResNet-18 blocks on CIFAR-10 dataset.
  • Figure 5: (a) Generalization and robustness trade-off; (b) Performance across different perturbation strengths; (c) Robust overfitting, on CIFAR-10 dataset. CURE achieves a better trade-off, and shows consistent robustness across increasing $\epsilon$ levels, while also mitigating robust overfitting.
  • ...and 7 more figures