Table of Contents
Fetching ...

Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks

Rui Wang, Zeming Wei, Xiyue Zhang, Meng Sun

TL;DR

This work tackles robust generalization of DNNs to unforeseen adversarial attacks by casting multi-type fine-tuning as a multi-armed bandit problem and introducing Calibrated Adversarial Sampling (CAS). CAS leverages a dynamic reward design that combines marginal robustness gains with cross-type trade-offs and employs a UCB-based sampling strategy to balance exploration and exploitation during fine-tuning of pre-trained robust models. Theoretical analysis under Robbins-Monro conditions establishes convergence and a safe parameter-drift threshold, while experiments on CIFAR-10, CIFAR-100, and SVHN across 21 attack types demonstrate superior overall robustness and preserved clean accuracy compared to baselines. Ablation studies validate the necessity of the cross-type reward and UCB mechanism, highlighting CAS as a scalable, practical paradigm for robust generalization against unforeseen adversarial perturbations.

Abstract

Deep Neural Networks (DNNs) are known to be vulnerable to various adversarial perturbations. To address the safety concerns arising from these vulnerabilities, adversarial training (AT) has emerged as one of the most effective paradigms for enhancing the robustness of DNNs. However, existing AT frameworks primarily focus on a single or a limited set of attack types, leaving DNNs still exposed to attack types that may be encountered in practice but not addressed during training. In this paper, we propose an efficient fine-tuning method called Calibrated Adversarial Sampling (CAS) to address these issues. From the optimization perspective within the multi-armed bandit framework, it dynamically designs rewards and balances exploration and exploitation by considering the dynamic and interdependent characteristics of multiple robustness dimensions. Experiments on benchmark datasets show that CAS achieves superior overall robustness while maintaining high clean accuracy, providing a new paradigm for robust generalization of DNNs.

Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks

TL;DR

This work tackles robust generalization of DNNs to unforeseen adversarial attacks by casting multi-type fine-tuning as a multi-armed bandit problem and introducing Calibrated Adversarial Sampling (CAS). CAS leverages a dynamic reward design that combines marginal robustness gains with cross-type trade-offs and employs a UCB-based sampling strategy to balance exploration and exploitation during fine-tuning of pre-trained robust models. Theoretical analysis under Robbins-Monro conditions establishes convergence and a safe parameter-drift threshold, while experiments on CIFAR-10, CIFAR-100, and SVHN across 21 attack types demonstrate superior overall robustness and preserved clean accuracy compared to baselines. Ablation studies validate the necessity of the cross-type reward and UCB mechanism, highlighting CAS as a scalable, practical paradigm for robust generalization against unforeseen adversarial perturbations.

Abstract

Deep Neural Networks (DNNs) are known to be vulnerable to various adversarial perturbations. To address the safety concerns arising from these vulnerabilities, adversarial training (AT) has emerged as one of the most effective paradigms for enhancing the robustness of DNNs. However, existing AT frameworks primarily focus on a single or a limited set of attack types, leaving DNNs still exposed to attack types that may be encountered in practice but not addressed during training. In this paper, we propose an efficient fine-tuning method called Calibrated Adversarial Sampling (CAS) to address these issues. From the optimization perspective within the multi-armed bandit framework, it dynamically designs rewards and balances exploration and exploitation by considering the dynamic and interdependent characteristics of multiple robustness dimensions. Experiments on benchmark datasets show that CAS achieves superior overall robustness while maintaining high clean accuracy, providing a new paradigm for robust generalization of DNNs.

Paper Structure

This paper contains 51 sections, 5 theorems, 51 equations, 8 figures, 5 tables, 3 algorithms.

Key Result

Lemma 1

The change in average robust risk $\Delta\mathcal{R}_{avg}$ can be bounded as: where $\psi$ denotes the angle between $\nabla\mathcal{R}_{avg}(\theta_1)$ and $\nabla\mathcal{R}_{q}(\theta_1)$, $H_{avg}$ is the local Hessian of $\mathcal{R}_{avg}$, $\lambda_{\max}(H_{avg})$ denotes its largest eigenvalue of the Hessian at $\theta_1$.

Figures (8)

  • Figure 1: Illustration of adversarial perturbations by different attack types from hsiung2023towards.
  • Figure 2: An overview of our CAS framework.
  • Figure 3: Trade-off matrix visualization. Each entry represents the change in robust accuracy of a specific attack type (shown along the top) after 3 epochs of sequential fine-tuning against designated attack types (shown on the left).
  • Figure 4: Ablation study examining the effect of training epochs using SVHN. Robust accuracy is the weighted average, and all results are reported as the mean over five independent runs.
  • Figure 5: Ablation study on the number of considered perturbations using CIFAR-10. The 18 semantic attacks (Wood, Elastic, Pixel, Snow, Gabor, JPEG, Glitch, Kaleidoscope, Blur, Edge, Fog, Texture, Prison, Whirlpool, Polkadot, Klotski, and Hsv) are divided into six groups, each containing three attacks in order from left to right. These groups are gradually incorporated, along with the three $\ell_p$ attacks, into CAS fine-tuning. The $\ell_p$ attacks are assigned a weight of 6, and all considered semantic attacks are assigned with a weight of 1. We report the average robust accuracy within each group and the weighted average accuracy across all attacks. Results are averaged over five independent runs.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Lemma 1: Average robust risk bound
  • Proposition 1: Safe parameter-drift threshold
  • Lemma 1: Average robust risk bound
  • Proof 1
  • Proposition 1: Safe parameter-drift threshold
  • Proof 2
  • Theorem 1: Convergence of CAS
  • Proof 3