Table of Contents
Fetching ...

Wasserstein distributional adversarial training for deep neural networks

Xingjian Bai, Guangyi He, Yifan Jiang, Jan Obloj

TL;DR

The paper tackles adversarial robustness under distributional threats by extending TRADES to Wasserstein distributionally robust optimization (W-DRO) and derives a first-order sensitivity-based approximation to enable practical training. It introduces a W-DRO reformulation and a W-PGD-budget attack strategy, along with a fine-tuning protocol (randomize last layer or apply small perturbations) to upgrade pre-trained models without sacrificing existing pointwise robustness. Empirical validation on multiple RobustBench networks trained on CIFAR-10 demonstrates consistent improvements in Wasserstein distributional robustness, with varying gains depending on prior training data scales; some gains persist even when fine-tuning only on the original data. The work provides a scalable, relatively inexpensive approach to bolster distributional adversarial defenses, offering guidance for applying W-DRO fine-tuning to existing models in practice.

Abstract

Design of adversarial attacks for deep neural networks, as well as methods of adversarial training against them, are subject of intense research. In this paper, we propose methods to train against distributional attack threats, extending the TRADES method used for pointwise attacks. Our approach leverages recent contributions and relies on sensitivity analysis for Wasserstein distributionally robust optimization problems. We introduce an efficient fine-tuning method which can be deployed on a previously trained model. We test our methods on a range of pre-trained models on RobustBench. These experimental results demonstrate the additional training enhances Wasserstein distributional robustness, while maintaining original levels of pointwise robustness, even for already very successful networks. The improvements are less marked for models pre-trained using huge synthetic datasets of 20-100M images. However, remarkably, sometimes our methods are still able to improve their performance even when trained using only the original training dataset (50k images).

Wasserstein distributional adversarial training for deep neural networks

TL;DR

The paper tackles adversarial robustness under distributional threats by extending TRADES to Wasserstein distributionally robust optimization (W-DRO) and derives a first-order sensitivity-based approximation to enable practical training. It introduces a W-DRO reformulation and a W-PGD-budget attack strategy, along with a fine-tuning protocol (randomize last layer or apply small perturbations) to upgrade pre-trained models without sacrificing existing pointwise robustness. Empirical validation on multiple RobustBench networks trained on CIFAR-10 demonstrates consistent improvements in Wasserstein distributional robustness, with varying gains depending on prior training data scales; some gains persist even when fine-tuning only on the original data. The work provides a scalable, relatively inexpensive approach to bolster distributional adversarial defenses, offering guidance for applying W-DRO fine-tuning to existing models in practice.

Abstract

Design of adversarial attacks for deep neural networks, as well as methods of adversarial training against them, are subject of intense research. In this paper, we propose methods to train against distributional attack threats, extending the TRADES method used for pointwise attacks. Our approach leverages recent contributions and relies on sensitivity analysis for Wasserstein distributionally robust optimization problems. We introduce an efficient fine-tuning method which can be deployed on a previously trained model. We test our methods on a range of pre-trained models on RobustBench. These experimental results demonstrate the additional training enhances Wasserstein distributional robustness, while maintaining original levels of pointwise robustness, even for already very successful networks. The improvements are less marked for models pre-trained using huge synthetic datasets of 20-100M images. However, remarkably, sometimes our methods are still able to improve their performance even when trained using only the original training dataset (50k images).

Paper Structure

This paper contains 25 sections, 1 theorem, 13 equations, 4 figures, 11 tables, 1 algorithm.

Key Result

Theorem 4.1

Assume the map $(x,x',y)\mapsto J_{\theta}(x,x',y)$ is Lipschitz. Then the following first order approximations hold:

Figures (4)

  • Figure 7.1: The estimated $\Upsilon$ from 10% of the training set and its reference value for 5 pre-trained networks.
  • Figure 7.2: Performance of Zhang et al. (2019) under $\mathcal{W}_{2}$ adversarial attack on the test set (solid) and the validation set (dashed) along fine-tuning.
  • Figure 7.3: Performance of Zhang et al. (2019) under $\mathcal{W}_{\infty}$ adversarial attack on the test set (solid) and the validation set (dashed) along fine-tuning.
  • Figure 7.4: $\Upsilon$ of Zhang et al. (2019) along fine-tuning evaluated on the test set (solid) and the training set (dashed).

Theorems & Definitions (1)

  • Theorem 4.1