Improving Sustainability of Adversarial Examples in Class-Incremental Learning
Taifeng Liu, Xinjing Liu, Liangqiu Dong, Yang Liu, Yilong Yang, Zhuo Ma
TL;DR
This work tackles the instability of adversarial examples in Class-Incremental Learning (CIL) caused by domain drift as models are updated with new classes. It introduces SAE, a framework that stabilizes adversarial semantics by anchoring them to universal target-class semantics via a CLIP-based Semantic Correction Module and by grounding optimization in the initial CIL model, plus a Filtering-and-Augmentation Module to remove ambiguous semantics. The method combines a CLIP-driven semantic objective with a CIL-based surrogate loss, and refines candidate examples through semantic filtering and augmentation to produce sustainable perturbations that remain effective across evolving CIL models. Across CIFAR-100 and ImageNet-100, SAE achieves an average SASR improvement of 31.28% over baselines, with strong robustness across target classes and perturbation budgets, while maintaining perceptual indistinguishability. These results highlight a practical pathway to evaluate and enhance adversarial threats in dynamic learning environments, emphasizing the role of universal semantic anchors and CIL-aware optimization.
Abstract
Current adversarial examples (AEs) are typically designed for static models. However, with the wide application of Class-Incremental Learning (CIL), models are no longer static and need to be updated with new data distributed and labeled differently from the old ones. As a result, existing AEs often fail after CIL updates due to significant domain drift. In this paper, we propose SAE to enhance the sustainability of AEs against CIL. The core idea of SAE is to enhance the robustness of AE semantics against domain drift by making them more similar to the target class while distinguishing them from all other classes. Achieving this is challenging, as relying solely on the initial CIL model to optimize AE semantics often leads to overfitting. To resolve the problem, we propose a Semantic Correction Module. This module encourages the AE semantics to be generalized, based on a visual-language model capable of producing universal semantics. Additionally, it incorporates the CIL model to correct the optimization direction of the AE semantics, guiding them closer to the target class. To further reduce fluctuations in AE semantics, we propose a Filtering-and-Augmentation Module, which first identifies non-target examples with target-class semantics in the latent space and then augments them to foster more stable semantics. Comprehensive experiments demonstrate that SAE outperforms baselines by an average of 31.28% when updated with a 9-fold increase in the number of classes.
