Table of Contents
Fetching ...

Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection

Yinting Wu, Pai Peng, Bo Cai, Le Li, .

TL;DR

Batch-in-Batch (BB) presents a robust adversarial training framework that jointly designs initial perturbations for multiple copies of each batch and selects informative adversarial samples after attack. By duplicating batches, using a tuned Latin Hypercube Sampling (tLHS) for perturbations, and employing CP, GS, or BG sample selection, BB achieves increased adversarial accuracy across CIFAR-10, CIFAR-100, and SVHN for both N-FGSM and PGD-10, including up to 13.34 percentage points on SVHN at $\epsilon=8/255$. The approach yields smoother loss landscapes, more consistent attack success rates, and lower model confidence, while remaining cost-effective due to parallel perturbation generation and smaller final batch sizes. Overall, BB offers a flexible plug-in to enhance AT robustness across architectures and datasets, with clear guidance on perturbation design and selection strategy.

Abstract

Adversarial training methods commonly generate independent initial perturbation for adversarial samples from a simple uniform distribution, and obtain the training batch for the classifier without selection. In this work, we propose a simple yet effective training framework called Batch-in-Batch (BB) to enhance models robustness. It involves specifically a joint construction of initial values that could simultaneously generates $m$ sets of perturbations from the original batch set to provide more diversity for adversarial samples; and also includes various sample selection strategies that enable the trained models to have smoother losses and avoid overconfident outputs. Through extensive experiments on three benchmark datasets (CIFAR-10, SVHN, CIFAR-100) with two networks (PreActResNet18 and WideResNet28-10) that are used in both the single-step (Noise-Fast Gradient Sign Method, N-FGSM) and multi-step (Projected Gradient Descent, PGD-10) adversarial training, we show that models trained within the BB framework consistently have higher adversarial accuracy across various adversarial settings, notably achieving over a 13% improvement on the SVHN dataset with an attack radius of 8/255 compared to the N-FGSM baseline model. Furthermore, experimental analysis of the efficiency of both the proposed initial perturbation method and sample selection strategies validates our insights. Finally, we show that our framework is cost-effective in terms of computational resources, even with a relatively large value of $m$.

Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection

TL;DR

Batch-in-Batch (BB) presents a robust adversarial training framework that jointly designs initial perturbations for multiple copies of each batch and selects informative adversarial samples after attack. By duplicating batches, using a tuned Latin Hypercube Sampling (tLHS) for perturbations, and employing CP, GS, or BG sample selection, BB achieves increased adversarial accuracy across CIFAR-10, CIFAR-100, and SVHN for both N-FGSM and PGD-10, including up to 13.34 percentage points on SVHN at . The approach yields smoother loss landscapes, more consistent attack success rates, and lower model confidence, while remaining cost-effective due to parallel perturbation generation and smaller final batch sizes. Overall, BB offers a flexible plug-in to enhance AT robustness across architectures and datasets, with clear guidance on perturbation design and selection strategy.

Abstract

Adversarial training methods commonly generate independent initial perturbation for adversarial samples from a simple uniform distribution, and obtain the training batch for the classifier without selection. In this work, we propose a simple yet effective training framework called Batch-in-Batch (BB) to enhance models robustness. It involves specifically a joint construction of initial values that could simultaneously generates sets of perturbations from the original batch set to provide more diversity for adversarial samples; and also includes various sample selection strategies that enable the trained models to have smoother losses and avoid overconfident outputs. Through extensive experiments on three benchmark datasets (CIFAR-10, SVHN, CIFAR-100) with two networks (PreActResNet18 and WideResNet28-10) that are used in both the single-step (Noise-Fast Gradient Sign Method, N-FGSM) and multi-step (Projected Gradient Descent, PGD-10) adversarial training, we show that models trained within the BB framework consistently have higher adversarial accuracy across various adversarial settings, notably achieving over a 13% improvement on the SVHN dataset with an attack radius of 8/255 compared to the N-FGSM baseline model. Furthermore, experimental analysis of the efficiency of both the proposed initial perturbation method and sample selection strategies validates our insights. Finally, we show that our framework is cost-effective in terms of computational resources, even with a relatively large value of .
Paper Structure (24 sections, 8 equations, 11 figures, 4 tables, 2 algorithms)

This paper contains 24 sections, 8 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: The structure of Batch-in-Batch framework
  • Figure 2: An 2D example of LHS. Given an original sample $\bm{x}_0$, the $\mathord{\hbox{$\ell$}} _{\infty}(\epsilon)$ vicinity and a desired sample number $m=3$, LHS firstly divides the vicinity with $m^2=9$ squares ($m^d$ hypercubes for $d>3$ dimensional spaces), then draws samples in orthogonally selected squares
  • Figure 3: Loss landscapes of six models considered in Table \ref{['table:table 1']} where the first row corresponds to losses of N-FGSM based models while the second losses of PGD-10 based models. Followed the methodology described in engstrom2018evaluating, local losses are calculated as $L(f_{\bm{\theta}}(\bm{x}+t_{1}\bm{r}_1+t_{2}\bm{r}_2),y)$ where $(\bm{x}, y)$ is an original CIFAR-10 image (i.e., frog) with its label; $\bm{r}_1=\text{sign}[\nabla_{\bm{x}}L(f_{\bm{\theta}}(\bm{x}),y)]$ is the gradient direction and $\bm{r}_2 \sim \mathrm{Rademacher}(0.5)$ is a random direction; $t_1$ and $t_2$ are evenly distributed scalars on [-0.1, 0.1]. Numbers on the z-axis of each subplot represent standard deviation of losses
  • Figure 4: The success rates of both N-FGSM attack (blue line) and PGD-10 attack (orange line) against two baseline models (N-FGSM-plain and PGD-10-plain) and two models (BB(GS)-N-FGSM and BB(GS)-PGD-10) trained in BB framework. The success rate of two attacks for each model are calculated on the training set of CIFAR-10 dataset during the training process
  • Figure 5: The left (resp right) figure presents the mean cross entropy values of plain N-FGSM (resp PGD-10) model and its BB version during training process, where the mean values are computed on both the train and test set of CIFAR-10 dataset with attack radius $\epsilon=8/255$
  • ...and 6 more figures