Better Diffusion Models Further Improve Adversarial Training

Zekai Wang; Tianyu Pang; Chao Du; Min Lin; Weiwei Liu; Shuicheng Yan

Better Diffusion Models Further Improve Adversarial Training

Zekai Wang, Tianyu Pang, Chao Du, Min Lin, Weiwei Liu, Shuicheng Yan

TL;DR

The paper demonstrates that using a state-of-the-art class-conditional EDM to generate high-quality synthetic data can further boost adversarial training, achieving new RobustBench SOTA robustness on CIFAR-10/100 without external data. Through extensive ablations on data quantity, quality, augmentation, and training hyperparameters, the authors show that larger, higher-FID-aligned diffusion data reduces robust overfitting and improves both clean and robust accuracy. Key findings include the superiority of class-conditioned generation, the benefit of moderate data quantities (around 1M), and the importance of appropriate training settings (batch size, β in TRADES, label smoothing). The work underscores the potential of diffusion-model-based data augmentation to substantially elevate robustness, while also highlighting efficiency considerations for practical deployment.

Abstract

It has been recognized that the data generated by the denoising diffusion probabilistic model (DDPM) improves adversarial training. After two years of rapid development in diffusion models, a question naturally arises: can better diffusion models further improve adversarial training? This paper gives an affirmative answer by employing the most recent diffusion model which has higher efficiency ($\sim 20$ sampling steps) and image quality (lower FID score) compared with DDPM. Our adversarially trained models achieve state-of-the-art performance on RobustBench using only generated data (no external datasets). Under the $\ell_\infty$-norm threat model with $ε=8/255$, our models achieve $70.69\%$ and $42.67\%$ robust accuracy on CIFAR-10 and CIFAR-100, respectively, i.e. improving upon previous state-of-the-art models by $+4.58\%$ and $+8.03\%$. Under the $\ell_2$-norm threat model with $ε=128/255$, our models achieve $84.86\%$ on CIFAR-10 ($+4.44\%$). These results also beat previous works that use external data. We also provide compelling results on the SVHN and TinyImageNet datasets. Our code is available at https://github.com/wzekai99/DM-Improves-AT.

Better Diffusion Models Further Improve Adversarial Training

TL;DR

Abstract

sampling steps) and image quality (lower FID score) compared with DDPM. Our adversarially trained models achieve state-of-the-art performance on RobustBench using only generated data (no external datasets). Under the

-norm threat model with

, our models achieve

and

robust accuracy on CIFAR-10 and CIFAR-100, respectively, i.e. improving upon previous state-of-the-art models by

and

. Under the

-norm threat model with

, our models achieve

on CIFAR-10 (

). These results also beat previous works that use external data. We also provide compelling results on the SVHN and TinyImageNet datasets. Our code is available at https://github.com/wzekai99/DM-Improves-AT.

Paper Structure (19 sections, 3 equations, 6 figures, 13 tables)

This paper contains 19 sections, 3 equations, 6 figures, 13 tables.

Introduction
Related Work
Experiment Setup
Comparison with State-of-the-Art
How Generated Data Influence Robustness
Early Stopping and Number of Epochs
Amount of Generated Data
Data Augmentation
Quality of Generated Data
Sensitivity Analysis
Sensitivity Analysis
Discussion
Technical Details
Additional Experiments
Original-to-Generated Ratio
...and 4 more sections

Figures (6)

Figure 1: Robust accuracy (against AutoAttack) and clean accuracy of top-rank models (no external datasets) in the leaderboard of RobustBench. The publication year of top-rank models is indicated by different colors. Our models use the WRN-28-10 and WRN-70-16 architectures in each setting, and detailed accuracy values are provided in Table \ref{['tab:sota_cifar10']} and Table \ref{['tab:sota_cifar100']}.
Figure 2:
Figure 3: Clean accuracy and robust accuracy against PGD-40 and AA with respect to original-to-generated ratios (0 means generated images only, 1 means CIFAR-10 training set only). We train WRN-28-10 models against ($\ell_{\infty}$, $\epsilon=8/255$) on CIFAR-10 using 1M generated data.
Figure 4: Clean and PGD robust accuracy of adversarial training using different amounts of generated data.
Figure 9: Test accuracy (%) with different values of batch size (left), label smoothing (LS) (middle), and $\beta$ in TRADES (right), under the ($\ell_{\infty}$, $\epsilon=8/255$) threat model on CIFAR-10.
...and 1 more figures

Better Diffusion Models Further Improve Adversarial Training

TL;DR

Abstract

Better Diffusion Models Further Improve Adversarial Training

Authors

TL;DR

Abstract

Table of Contents

Figures (6)