Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks
Zhenyu Liu, Haoran Duan, Huizhi Liang, Yang Long, Vaclav Snasel, Guiseppe Nicosia, Rajiv Ranjan, Varun Ojha
TL;DR
This paper tackles the problem of robust adversarial training by addressing robust overfitting in static-label defenses and suboptimal clean accuracy when using KL-divergence losses. It introduces Dynamic Label Adversarial Training (DYNAT), where a guiding model produces dynamic labels $l^{nat}$ from clean inputs to supervise a target model via cross-entropy loss, while a weak-to-strong learning progression and an inner optimization generate dynamic adversarial examples. The approach jointly trains the guiding and target models with a balance parameter $\\beta$, and augments standard adversarial training with a dynamic inner optimization that adapts adversarial generation as labels strengthen. Empirical results on CIFAR-10/100 with architectures like WRN34-10 and ResNet-18 demonstrate superior clean accuracy and robust performance against strong attacks (PGD, C&W, AA) compared with baselines and variants, including GAIRAT and MAIL. The findings indicate that dynamic labels enable robust learning without requiring the guiding model to be vastly larger, offering practical gains in robustness across diverse attack scenarios.
Abstract
Adversarial training is one of the most effective methods for enhancing model robustness. Recent approaches incorporate adversarial distillation in adversarial training architectures. However, we notice two scenarios of defense methods that limit their performance: (1) Previous methods primarily use static ground truth for adversarial training, but this often causes robust overfitting; (2) The loss functions are either Mean Squared Error or KL-divergence leading to a sub-optimal performance on clean accuracy. To solve those problems, we propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gradually and dynamically gain robustness from the guide model's decisions. Additionally, we found that a budgeted dimension of inner optimization for the target model may contribute to the trade-off between clean accuracy and robust accuracy. Therefore, we propose a novel inner optimization method to be incorporated into the adversarial training. This will enable the target model to adaptively search for adversarial examples based on dynamic labels from the guiding model, contributing to the robustness of the target model. Extensive experiments validate the superior performance of our approach.
