New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class

Yanyun Wang; Li Liu; Zi Liang; Yi R.; Fung; Qingqing Ye; Haibo Hu

New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class

Yanyun Wang, Li Liu, Zi Liang, Yi R., Fung, Qingqing Ye, Haibo Hu

TL;DR

The paper tackles the persistent accuracy-robustness trade-off in adversarial training by challenging the standard assumption that benign and adversarial samples should share the same class. It introduces a dummy-class paradigm that appends a dummy class for each original class, paired with two-hot soft labels and a runtime projection to recover original predictions, thereby decoupling clean and robust objectives. The proposed method, DUCAT, provides a practical, plug-and-play training objective that yields consistent gains in clean accuracy and robustness across CIFAR-10/100 and Tiny-ImageNet on multiple backbones, outperforming 18 state-of-the-art trade-off methods. This approach offers a scalable, hardware-friendly route to more robust models in real-world settings, with open-source code and broad empirical validation across threat models and datasets.

Abstract

Adversarial Training (AT) is one of the most effective methods to enhance the robustness of Deep Neural Networks (DNNs). However, existing AT methods suffer from an inherent accuracy-robustness trade-off. Previous works have studied this issue under the current AT paradigm, but still face over 10% accuracy reduction without significant robustness improvement over simple baselines such as PGD-AT. This inherent trade-off raises a question: Whether the current AT paradigm, which assumes to learn corresponding benign and adversarial samples as the same class, inappropriately mixes clean and robust objectives that may be essentially inconsistent. In fact, our empirical results show that up to 40% of CIFAR-10 adversarial samples always fail to satisfy such an assumption across various AT methods and robust models, explicitly indicating the room for improvement of the current AT paradigm. To relax from this overstrict assumption and the tension between clean and robust learning, in this work, we propose a new AT paradigm by introducing an additional dummy class for each original class, aiming to accommodate hard adversarial samples with shifted distribution after perturbation. The robustness w.r.t. these adversarial samples can be achieved by runtime recovery from the predicted dummy classes to the corresponding original ones, without conflicting with the clean objective on accuracy of benign samples. Finally, based on our new paradigm, we propose a novel DUmmy Classes-based Adversarial Training (DUCAT) method that concurrently improves accuracy and robustness in a plug-and-play manner only relevant to logits, loss, and a proposed two-hot soft label-based supervised signal. Our method outperforms state-of-the-art (SOTA) benchmarks, effectively releasing the current trade-off. The code is available at https://github.com/FlaAI/DUCAT.

New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class

TL;DR

Abstract

New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)