Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks
Rui Wang, Zeming Wei, Xiyue Zhang, Meng Sun
TL;DR
This work tackles robust generalization of DNNs to unforeseen adversarial attacks by casting multi-type fine-tuning as a multi-armed bandit problem and introducing Calibrated Adversarial Sampling (CAS). CAS leverages a dynamic reward design that combines marginal robustness gains with cross-type trade-offs and employs a UCB-based sampling strategy to balance exploration and exploitation during fine-tuning of pre-trained robust models. Theoretical analysis under Robbins-Monro conditions establishes convergence and a safe parameter-drift threshold, while experiments on CIFAR-10, CIFAR-100, and SVHN across 21 attack types demonstrate superior overall robustness and preserved clean accuracy compared to baselines. Ablation studies validate the necessity of the cross-type reward and UCB mechanism, highlighting CAS as a scalable, practical paradigm for robust generalization against unforeseen adversarial perturbations.
Abstract
Deep Neural Networks (DNNs) are known to be vulnerable to various adversarial perturbations. To address the safety concerns arising from these vulnerabilities, adversarial training (AT) has emerged as one of the most effective paradigms for enhancing the robustness of DNNs. However, existing AT frameworks primarily focus on a single or a limited set of attack types, leaving DNNs still exposed to attack types that may be encountered in practice but not addressed during training. In this paper, we propose an efficient fine-tuning method called Calibrated Adversarial Sampling (CAS) to address these issues. From the optimization perspective within the multi-armed bandit framework, it dynamically designs rewards and balances exploration and exploitation by considering the dynamic and interdependent characteristics of multiple robustness dimensions. Experiments on benchmark datasets show that CAS achieves superior overall robustness while maintaining high clean accuracy, providing a new paradigm for robust generalization of DNNs.
