Table of Contents
Fetching ...

New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class

Yanyun Wang, Li Liu, Zi Liang, Yi R., Fung, Qingqing Ye, Haibo Hu

TL;DR

The paper tackles the persistent accuracy-robustness trade-off in adversarial training by challenging the standard assumption that benign and adversarial samples should share the same class. It introduces a dummy-class paradigm that appends a dummy class for each original class, paired with two-hot soft labels and a runtime projection to recover original predictions, thereby decoupling clean and robust objectives. The proposed method, DUCAT, provides a practical, plug-and-play training objective that yields consistent gains in clean accuracy and robustness across CIFAR-10/100 and Tiny-ImageNet on multiple backbones, outperforming 18 state-of-the-art trade-off methods. This approach offers a scalable, hardware-friendly route to more robust models in real-world settings, with open-source code and broad empirical validation across threat models and datasets.

Abstract

Adversarial Training (AT) is one of the most effective methods to enhance the robustness of Deep Neural Networks (DNNs). However, existing AT methods suffer from an inherent accuracy-robustness trade-off. Previous works have studied this issue under the current AT paradigm, but still face over 10% accuracy reduction without significant robustness improvement over simple baselines such as PGD-AT. This inherent trade-off raises a question: Whether the current AT paradigm, which assumes to learn corresponding benign and adversarial samples as the same class, inappropriately mixes clean and robust objectives that may be essentially inconsistent. In fact, our empirical results show that up to 40% of CIFAR-10 adversarial samples always fail to satisfy such an assumption across various AT methods and robust models, explicitly indicating the room for improvement of the current AT paradigm. To relax from this overstrict assumption and the tension between clean and robust learning, in this work, we propose a new AT paradigm by introducing an additional dummy class for each original class, aiming to accommodate hard adversarial samples with shifted distribution after perturbation. The robustness w.r.t. these adversarial samples can be achieved by runtime recovery from the predicted dummy classes to the corresponding original ones, without conflicting with the clean objective on accuracy of benign samples. Finally, based on our new paradigm, we propose a novel DUmmy Classes-based Adversarial Training (DUCAT) method that concurrently improves accuracy and robustness in a plug-and-play manner only relevant to logits, loss, and a proposed two-hot soft label-based supervised signal. Our method outperforms state-of-the-art (SOTA) benchmarks, effectively releasing the current trade-off. The code is available at https://github.com/FlaAI/DUCAT.

New Paradigm of Adversarial Training: Releasing Accuracy-Robustness Trade-Off via Dummy Class

TL;DR

The paper tackles the persistent accuracy-robustness trade-off in adversarial training by challenging the standard assumption that benign and adversarial samples should share the same class. It introduces a dummy-class paradigm that appends a dummy class for each original class, paired with two-hot soft labels and a runtime projection to recover original predictions, thereby decoupling clean and robust objectives. The proposed method, DUCAT, provides a practical, plug-and-play training objective that yields consistent gains in clean accuracy and robustness across CIFAR-10/100 and Tiny-ImageNet on multiple backbones, outperforming 18 state-of-the-art trade-off methods. This approach offers a scalable, hardware-friendly route to more robust models in real-world settings, with open-source code and broad empirical validation across threat models and datasets.

Abstract

Adversarial Training (AT) is one of the most effective methods to enhance the robustness of Deep Neural Networks (DNNs). However, existing AT methods suffer from an inherent accuracy-robustness trade-off. Previous works have studied this issue under the current AT paradigm, but still face over 10% accuracy reduction without significant robustness improvement over simple baselines such as PGD-AT. This inherent trade-off raises a question: Whether the current AT paradigm, which assumes to learn corresponding benign and adversarial samples as the same class, inappropriately mixes clean and robust objectives that may be essentially inconsistent. In fact, our empirical results show that up to 40% of CIFAR-10 adversarial samples always fail to satisfy such an assumption across various AT methods and robust models, explicitly indicating the room for improvement of the current AT paradigm. To relax from this overstrict assumption and the tension between clean and robust learning, in this work, we propose a new AT paradigm by introducing an additional dummy class for each original class, aiming to accommodate hard adversarial samples with shifted distribution after perturbation. The robustness w.r.t. these adversarial samples can be achieved by runtime recovery from the predicted dummy classes to the corresponding original ones, without conflicting with the clean objective on accuracy of benign samples. Finally, based on our new paradigm, we propose a novel DUmmy Classes-based Adversarial Training (DUCAT) method that concurrently improves accuracy and robustness in a plug-and-play manner only relevant to logits, loss, and a proposed two-hot soft label-based supervised signal. Our method outperforms state-of-the-art (SOTA) benchmarks, effectively releasing the current trade-off. The code is available at https://github.com/FlaAI/DUCAT.

Paper Structure

This paper contains 37 sections, 12 equations, 13 figures, 8 tables, 1 algorithm.

Figures (13)

  • Figure 1: Conceptual difference between conventional AT and ours. The current AT paradigm assumes that the adversarial sample $\mathbf{x}'$ should be assigned to the same class as the benign sample $\mathbf{x}$, which may be overstrict. In this work, we suggest introducing more dummy classes one-to-one corresponding to original ones, so that some hard $\mathbf{x}'$ with shifted distribution can be accommodated without significantly hurting clean learning on $\mathbf{x}$.
  • Figure 2: Comparison of the proposed DUCAT under our new paradigm of adversarial training and PGD-AT under the conventional paradigm. Currently, DNNs are adversarially trained to directly classify unseen inference-time $\mathbf{x}'$ to the same class as $\mathbf{x}$. In contrast, we suggest $C$ more dummy classes as in Figure \ref{['fig:intro']}, along with a uniquely designed two-hot soft label-based learning to bridge them with original classes, so that DNNs can also assign $\mathbf{x}'$ to dummy classes and ensure robustness on them by inference-time recovery from predicted $[C\!+\!1\,...\,2C]$ to original $[1\,...\,C]$. This new paradigm relaxes the overstrict current assumption, releasing accuracy-robustness trade-off it causes.
  • Figure 3: High overlap between adversarial samples evading four AT benchmarks, implying such failures are more likely from an inappropriate learning objective under the current paradigm, rather than any specific AT methods.
  • Figure 4: Robust models already enhanced by different AT methods are still highly likely to be uniformly beaten by the successful adversarial samples generated based on any one of these models. Such a deconstruction of adversarial transferability between robust models reveals the model-independent vulnerability of certain samples, which further supports our deduction for the current AT paradigm.
  • Figure 5: The logic flow of our motivation to introduce dummy classes in the new AT paradigm.
  • ...and 8 more figures