Table of Contents
Fetching ...

TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training

Bongsoo Yi, Rongjie Lai, Yao Li

TL;DR

This work addresses the clean accuracy drop often observed in adversarial training by introducing Tangent Direction Guided Adversarial Training (TART), which uses the data manifold's tangent space to adapt per-sample perturbations. TART estimates tangent spaces via an offline autoencoder–PCA pipeline, computes the tangential component of adversarial perturbations using the projection $\\Pi_A = A(A^T A)^{-1}A^T$, and selects an adaptive bound $\\epsilon_i$ based on whether the tangential component is above the batch median. The key contributions are (i) tangent-space estimation for each input, (ii) a per-sample perturbation strategy that emphasizes tangential over normal components, and (iii) theoretical justification coupled with empirical validation showing improved clean accuracy with preserved robustness across simulated and CIFAR-10 experiments, and compatibility with standard AT, TRADES, MART, and GAIRAT. The results suggest that incorporating geometric properties of data can enhance adversarial training efficiency and effectiveness in practical settings.

Abstract

Adversarial training has been shown to be successful in enhancing the robustness of deep neural networks against adversarial attacks. However, this robustness is accompanied by a significant decline in accuracy on clean data. In this paper, we propose a novel method, called Tangent Direction Guided Adversarial Training (TART), that leverages the tangent space of the data manifold to ameliorate the existing adversarial defense algorithms. We argue that training with adversarial examples having large normal components significantly alters the decision boundary and hurts accuracy. TART mitigates this issue by estimating the tangent direction of adversarial examples and allocating an adaptive perturbation limit according to the norm of their tangential component. To the best of our knowledge, our paper is the first work to consider the concept of tangent space and direction in the context of adversarial defense. We validate the effectiveness of TART through extensive experiments on both simulated and benchmark datasets. The results demonstrate that TART consistently boosts clean accuracy while retaining a high level of robustness against adversarial attacks. Our findings suggest that incorporating the geometric properties of data can lead to more effective and efficient adversarial training methods.

TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training

TL;DR

This work addresses the clean accuracy drop often observed in adversarial training by introducing Tangent Direction Guided Adversarial Training (TART), which uses the data manifold's tangent space to adapt per-sample perturbations. TART estimates tangent spaces via an offline autoencoder–PCA pipeline, computes the tangential component of adversarial perturbations using the projection , and selects an adaptive bound based on whether the tangential component is above the batch median. The key contributions are (i) tangent-space estimation for each input, (ii) a per-sample perturbation strategy that emphasizes tangential over normal components, and (iii) theoretical justification coupled with empirical validation showing improved clean accuracy with preserved robustness across simulated and CIFAR-10 experiments, and compatibility with standard AT, TRADES, MART, and GAIRAT. The results suggest that incorporating geometric properties of data can enhance adversarial training efficiency and effectiveness in practical settings.

Abstract

Adversarial training has been shown to be successful in enhancing the robustness of deep neural networks against adversarial attacks. However, this robustness is accompanied by a significant decline in accuracy on clean data. In this paper, we propose a novel method, called Tangent Direction Guided Adversarial Training (TART), that leverages the tangent space of the data manifold to ameliorate the existing adversarial defense algorithms. We argue that training with adversarial examples having large normal components significantly alters the decision boundary and hurts accuracy. TART mitigates this issue by estimating the tangent direction of adversarial examples and allocating an adaptive perturbation limit according to the norm of their tangential component. To the best of our knowledge, our paper is the first work to consider the concept of tangent space and direction in the context of adversarial defense. We validate the effectiveness of TART through extensive experiments on both simulated and benchmark datasets. The results demonstrate that TART consistently boosts clean accuracy while retaining a high level of robustness against adversarial attacks. Our findings suggest that incorporating the geometric properties of data can lead to more effective and efficient adversarial training methods.
Paper Structure (23 sections, 1 theorem, 12 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 23 sections, 1 theorem, 12 equations, 5 figures, 3 tables, 2 algorithms.

Key Result

Proposition 3.1

For any function $f$, where $\text{TV}({\mathcal{P}}, {\mathcal{Q}})$ denotes the total variance distance between ${\mathcal{P}}$ and ${\mathcal{Q}}$.

Figures (5)

  • Figure 1: Overview of TART and comparison with standard adversarial training. Given a training image ${\bm{x}}_i$, we first generate an adversarial example ${\bm{x}}_i^*$ with a fixed perturbation bound $\epsilon$. Standard adversarial training trains a robust model using these examples ${\bm{x}}_i^*$. In contrast, TART first estimates and stores the tangent space of each training image offline using a pre-trained autoencoder and principal component analysis (PCA). Then, based on the stored tangent space information, TART computes the tangential component of ${\bm{x}}_i^*$. TART finally uses ${\bm{x}}_i^*$ for robust training if its tangential component falls within the upper 50%, and employs ${\bm{x}}_i$ if the tangential component is in the lower 50%.
  • Figure 2: Distribution of tangential components and angle degrees. The adversarial examples were generated based on a VGG-16 model trained on clean CIFAR-10. The maximum, mean, and minimum values of the tangential component are 0.19, 0.34, and 0.58, respectively, while the corresponding values for the angle degrees are 57$^\circ$, 74$^\circ$, and 81$^\circ$.
  • Figure 3: Illustration of Tangent Space Estimation. See Algorithm \ref{['alg:compute_tan_angle']} for a detailed description.
  • Figure 4: Decision boundary visualization for the toy problem by Rade and Moosavi-Dezfooli Rade2022HATReducingEM: (Left) $x_3 = 0.85$, (Right) $x_2=0$. Standard AT considerably enhances robustness from 56% to 77% but results in a reduction in accuracy from 100% to 92%. TART recovers the accuracy to 96% and even slightly improves the robustness to 79%.
  • Figure 5: Loss vs. Mean of tangential components within a batch.

Theorems & Definitions (2)

  • Proposition 3.1
  • proof