Table of Contents
Fetching ...

NPAT Null-Space Projected Adversarial Training Towards Zero Deterioration

Hanyi Hu, Qiao Han, Kui Chen, Yao Yang

TL;DR

Adversarial training often trades clean accuracy for robustness. NPAT leverages a null-space projector $P_{Null(W)}$ to constrain adversarial perturbations or gradients within the null-space of a pretrained high-accuracy model's decision boundary, enabling two implementations: NPDA for data augmentation and NPGD for gradient-based updates. The approach yields robustness comparable to state-of-the-art methods on CIFAR10 and SVHN while preserving near-original generalization, and analyses show stability across choices of $\beta$ and hidden-space dimensions. By integrating with existing loss formulations, NPDA and NPGD offer a practical, add-on solution to enhance adversarial robustness without sacrificing performance on clean data.

Abstract

To mitigate the susceptibility of neural networks to adversarial attacks, adversarial training has emerged as a prevalent and effective defense strategy. Intrinsically, this countermeasure incurs a trade-off, as it sacrifices the model's accuracy in processing normal samples. To reconcile the trade-off, we pioneer the incorporation of null-space projection into adversarial training and propose two innovative Null-space Projection based Adversarial Training(NPAT) algorithms tackling sample generation and gradient optimization, named Null-space Projected Data Augmentation (NPDA) and Null-space Projected Gradient Descent (NPGD), to search for an overarching optimal solutions, which enhance robustness with almost zero deterioration in generalization performance. Adversarial samples and perturbations are constrained within the null-space of the decision boundary utilizing a closed-form null-space projector, effectively mitigating threat of attack stemming from unreliable features. Subsequently, we conducted experiments on the CIFAR10 and SVHN datasets and reveal that our methodology can seamlessly combine with adversarial training methods and obtain comparable robustness while keeping generalization close to a high-accuracy model.

NPAT Null-Space Projected Adversarial Training Towards Zero Deterioration

TL;DR

Adversarial training often trades clean accuracy for robustness. NPAT leverages a null-space projector to constrain adversarial perturbations or gradients within the null-space of a pretrained high-accuracy model's decision boundary, enabling two implementations: NPDA for data augmentation and NPGD for gradient-based updates. The approach yields robustness comparable to state-of-the-art methods on CIFAR10 and SVHN while preserving near-original generalization, and analyses show stability across choices of and hidden-space dimensions. By integrating with existing loss formulations, NPDA and NPGD offer a practical, add-on solution to enhance adversarial robustness without sacrificing performance on clean data.

Abstract

To mitigate the susceptibility of neural networks to adversarial attacks, adversarial training has emerged as a prevalent and effective defense strategy. Intrinsically, this countermeasure incurs a trade-off, as it sacrifices the model's accuracy in processing normal samples. To reconcile the trade-off, we pioneer the incorporation of null-space projection into adversarial training and propose two innovative Null-space Projection based Adversarial Training(NPAT) algorithms tackling sample generation and gradient optimization, named Null-space Projected Data Augmentation (NPDA) and Null-space Projected Gradient Descent (NPGD), to search for an overarching optimal solutions, which enhance robustness with almost zero deterioration in generalization performance. Adversarial samples and perturbations are constrained within the null-space of the decision boundary utilizing a closed-form null-space projector, effectively mitigating threat of attack stemming from unreliable features. Subsequently, we conducted experiments on the CIFAR10 and SVHN datasets and reveal that our methodology can seamlessly combine with adversarial training methods and obtain comparable robustness while keeping generalization close to a high-accuracy model.
Paper Structure (19 sections, 28 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 19 sections, 28 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: Scatter Plot of Model Standard Accuracy vs. Robustness under Auto-attack on CIFAR10.
  • Figure 2: Illustrations of Null Space Projection-Based Adversarial Training. a) Overall Structure of Adversarial Training Frameworks. b)An Illustration of Multi-step Null-space Projection Sample Generation Process. Black arrows represent the direction of deviation by adversarial training, red arrows represent the direction of null-space projected deviation by adversarial training in NPDA.
  • Figure 3: Distribution Of Toy Sample Representation $y$. Toy distribution of standard and adversarial sample representations from typical adversarial training(Typical AT) & null projection-based adversarial training(NPAT). Top view is a visualization of two randomly selected dimensions from column space of $W_L$. Front view is a visualization of one randomly selected from column space and one randomly selected dimension from null space of $W_L$. The red arrow denotes the deviation from standard sample to its adversarial peer.
  • Figure 4: Loss and Accuracy & Robustness Training Dynamics for 200 Epochs
  • Figure 5: Variation of Accuracy & Auto-attack Robustness w.r.t Adversarial Coefficient $\beta$.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 2.1
  • Definition 2.2
  • Remark 3.1
  • proof