Table of Contents
Fetching ...

Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks

Zhenyu Liu, Haoran Duan, Huizhi Liang, Yang Long, Vaclav Snasel, Guiseppe Nicosia, Rajiv Ranjan, Varun Ojha

TL;DR

This paper tackles the problem of robust adversarial training by addressing robust overfitting in static-label defenses and suboptimal clean accuracy when using KL-divergence losses. It introduces Dynamic Label Adversarial Training (DYNAT), where a guiding model produces dynamic labels $l^{nat}$ from clean inputs to supervise a target model via cross-entropy loss, while a weak-to-strong learning progression and an inner optimization generate dynamic adversarial examples. The approach jointly trains the guiding and target models with a balance parameter $\\beta$, and augments standard adversarial training with a dynamic inner optimization that adapts adversarial generation as labels strengthen. Empirical results on CIFAR-10/100 with architectures like WRN34-10 and ResNet-18 demonstrate superior clean accuracy and robust performance against strong attacks (PGD, C&W, AA) compared with baselines and variants, including GAIRAT and MAIL. The findings indicate that dynamic labels enable robust learning without requiring the guiding model to be vastly larger, offering practical gains in robustness across diverse attack scenarios.

Abstract

Adversarial training is one of the most effective methods for enhancing model robustness. Recent approaches incorporate adversarial distillation in adversarial training architectures. However, we notice two scenarios of defense methods that limit their performance: (1) Previous methods primarily use static ground truth for adversarial training, but this often causes robust overfitting; (2) The loss functions are either Mean Squared Error or KL-divergence leading to a sub-optimal performance on clean accuracy. To solve those problems, we propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gradually and dynamically gain robustness from the guide model's decisions. Additionally, we found that a budgeted dimension of inner optimization for the target model may contribute to the trade-off between clean accuracy and robust accuracy. Therefore, we propose a novel inner optimization method to be incorporated into the adversarial training. This will enable the target model to adaptively search for adversarial examples based on dynamic labels from the guiding model, contributing to the robustness of the target model. Extensive experiments validate the superior performance of our approach.

Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks

TL;DR

This paper tackles the problem of robust adversarial training by addressing robust overfitting in static-label defenses and suboptimal clean accuracy when using KL-divergence losses. It introduces Dynamic Label Adversarial Training (DYNAT), where a guiding model produces dynamic labels from clean inputs to supervise a target model via cross-entropy loss, while a weak-to-strong learning progression and an inner optimization generate dynamic adversarial examples. The approach jointly trains the guiding and target models with a balance parameter , and augments standard adversarial training with a dynamic inner optimization that adapts adversarial generation as labels strengthen. Empirical results on CIFAR-10/100 with architectures like WRN34-10 and ResNet-18 demonstrate superior clean accuracy and robust performance against strong attacks (PGD, C&W, AA) compared with baselines and variants, including GAIRAT and MAIL. The findings indicate that dynamic labels enable robust learning without requiring the guiding model to be vastly larger, offering practical gains in robustness across diverse attack scenarios.

Abstract

Adversarial training is one of the most effective methods for enhancing model robustness. Recent approaches incorporate adversarial distillation in adversarial training architectures. However, we notice two scenarios of defense methods that limit their performance: (1) Previous methods primarily use static ground truth for adversarial training, but this often causes robust overfitting; (2) The loss functions are either Mean Squared Error or KL-divergence leading to a sub-optimal performance on clean accuracy. To solve those problems, we propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gradually and dynamically gain robustness from the guide model's decisions. Additionally, we found that a budgeted dimension of inner optimization for the target model may contribute to the trade-off between clean accuracy and robust accuracy. Therefore, we propose a novel inner optimization method to be incorporated into the adversarial training. This will enable the target model to adaptively search for adversarial examples based on dynamic labels from the guiding model, contributing to the robustness of the target model. Extensive experiments validate the superior performance of our approach.
Paper Structure (15 sections, 7 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 7 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Proposed dynamic label adversarial training (DYNAT) of deep learning models. DYNAT explicitly gives flexibility on the loss functions for adversarial training, whose dynamic label comes from the guiding model. We train the guide model/network using a dataset with static (ground truth) labels (blue labels, $label_1, label_2, \ldots$.) and concurrently use the maximum value in the guiding model's logits as dynamic labels (orange labels, $l_1^{nat}, l_2^{nat}, \ldots$) for adversarial training of the target model/network. The guiding model $f^g (x_i^{nat})$ takes a clean image $x_i^{nat}$ as its input and produces a softmax probability value vector, which is converted into labels $l_i^{nat}$ using a one-hot or winner-takes-all principle that is used by target model for computing its cross-entropy $\mathcal{L}_t(f^t (x_i^{adv}),l_i^{nat})$ between target model's $f^t (x_i^{adv})$ on adversarial image $x_i^{adv}$. (The adversarial example generated in each iteration of training using an inner optimization is shown in Fig. \ref{['fig:inner_optimise']}.) This cross-entropy loss on dynamic label backpropagates to the target model for its dynamic label adversarial training. The dynamic label strength increases from weak to strong as the training loss $\mathcal{L}_t$ (Eq. \ref{['eq:teacher_student_loss']}) minimizes iteratively.
  • Figure 2: Our inner optimization framework. The natural images are fed into fixed target and guiding models. Then, we use our strategy to extract dynamic labels from guiding model outputs (orange labels). We encourage target model outputs and dynamic labels to participate together in adversarial example generation via the cross-entropy function. This generated adversarial example is fed back to the target model for the dynamic label adversarial training, i.e., outer optimization as shown in Fig. \ref{['fig:DYNAT']}. That is, Fig. \ref{['fig:DYNAT']} and Fig. \ref{['fig:inner_optimise']} are snapshots of the same training iteration where the target model (see the snow symbol in Fig. \ref{['fig:inner_optimise']} indicating the target model's parameters are frozen) in Fig. \ref{['fig:inner_optimise']} first produced adversarial example based on the dynamic label from the guiding model and Fig. \ref{['fig:DYNAT']} takes this adversarial example in the same training iteration to perform adversarial training (see the fire symbol in Fig. \ref{['fig:DYNAT']} indicating update of target model's parameters).
  • Figure 3: Epoch-wise performance of DYNAT compared with LBGAT on CIFAR-10 dataset. The student model/network was trained on 100 epochs. (a) Train accuracy of the target model and (b) Test accuracy of the target model. Note that gray lines represent the LBGAT method, and blue/red lines represent our proposed DYNAT method.
  • Figure 4: Performance of DYNAT on CIFAR-10 and CIFAR-100. (a) Comparison of DYNAT with the other defense methods, such as AT, TRADES, MART, etc., using the WRN34-10 network on the CIFAR-10 dataset. (b) Comparison of DYNAT with the other defense methods, such as AT, TRADES, SAT, etc., using the WRN34-10 network on the CIFAR-100 dataset. In each plot, the y-axis represents the clean accuracy.
  • Figure 5: Performance of WRN34-10 target model on CIFAR-10 dataset. Given the guide model (ResNet18), the target models (WRN-34-10) were obtained using the GAIRAT method (shown in light-colored bars) and a combination of our method with GAIRAT (shown in dark colors).