Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection
Yinting Wu, Pai Peng, Bo Cai, Le Li, .
TL;DR
Batch-in-Batch (BB) presents a robust adversarial training framework that jointly designs initial perturbations for multiple copies of each batch and selects informative adversarial samples after attack. By duplicating batches, using a tuned Latin Hypercube Sampling (tLHS) for perturbations, and employing CP, GS, or BG sample selection, BB achieves increased adversarial accuracy across CIFAR-10, CIFAR-100, and SVHN for both N-FGSM and PGD-10, including up to 13.34 percentage points on SVHN at $\epsilon=8/255$. The approach yields smoother loss landscapes, more consistent attack success rates, and lower model confidence, while remaining cost-effective due to parallel perturbation generation and smaller final batch sizes. Overall, BB offers a flexible plug-in to enhance AT robustness across architectures and datasets, with clear guidance on perturbation design and selection strategy.
Abstract
Adversarial training methods commonly generate independent initial perturbation for adversarial samples from a simple uniform distribution, and obtain the training batch for the classifier without selection. In this work, we propose a simple yet effective training framework called Batch-in-Batch (BB) to enhance models robustness. It involves specifically a joint construction of initial values that could simultaneously generates $m$ sets of perturbations from the original batch set to provide more diversity for adversarial samples; and also includes various sample selection strategies that enable the trained models to have smoother losses and avoid overconfident outputs. Through extensive experiments on three benchmark datasets (CIFAR-10, SVHN, CIFAR-100) with two networks (PreActResNet18 and WideResNet28-10) that are used in both the single-step (Noise-Fast Gradient Sign Method, N-FGSM) and multi-step (Projected Gradient Descent, PGD-10) adversarial training, we show that models trained within the BB framework consistently have higher adversarial accuracy across various adversarial settings, notably achieving over a 13% improvement on the SVHN dataset with an attack radius of 8/255 compared to the N-FGSM baseline model. Furthermore, experimental analysis of the efficiency of both the proposed initial perturbation method and sample selection strategies validates our insights. Finally, we show that our framework is cost-effective in terms of computational resources, even with a relatively large value of $m$.
