Table of Contents
Fetching ...

Improving Sustainability of Adversarial Examples in Class-Incremental Learning

Taifeng Liu, Xinjing Liu, Liangqiu Dong, Yang Liu, Yilong Yang, Zhuo Ma

TL;DR

This work tackles the instability of adversarial examples in Class-Incremental Learning (CIL) caused by domain drift as models are updated with new classes. It introduces SAE, a framework that stabilizes adversarial semantics by anchoring them to universal target-class semantics via a CLIP-based Semantic Correction Module and by grounding optimization in the initial CIL model, plus a Filtering-and-Augmentation Module to remove ambiguous semantics. The method combines a CLIP-driven semantic objective with a CIL-based surrogate loss, and refines candidate examples through semantic filtering and augmentation to produce sustainable perturbations that remain effective across evolving CIL models. Across CIFAR-100 and ImageNet-100, SAE achieves an average SASR improvement of 31.28% over baselines, with strong robustness across target classes and perturbation budgets, while maintaining perceptual indistinguishability. These results highlight a practical pathway to evaluate and enhance adversarial threats in dynamic learning environments, emphasizing the role of universal semantic anchors and CIL-aware optimization.

Abstract

Current adversarial examples (AEs) are typically designed for static models. However, with the wide application of Class-Incremental Learning (CIL), models are no longer static and need to be updated with new data distributed and labeled differently from the old ones. As a result, existing AEs often fail after CIL updates due to significant domain drift. In this paper, we propose SAE to enhance the sustainability of AEs against CIL. The core idea of SAE is to enhance the robustness of AE semantics against domain drift by making them more similar to the target class while distinguishing them from all other classes. Achieving this is challenging, as relying solely on the initial CIL model to optimize AE semantics often leads to overfitting. To resolve the problem, we propose a Semantic Correction Module. This module encourages the AE semantics to be generalized, based on a visual-language model capable of producing universal semantics. Additionally, it incorporates the CIL model to correct the optimization direction of the AE semantics, guiding them closer to the target class. To further reduce fluctuations in AE semantics, we propose a Filtering-and-Augmentation Module, which first identifies non-target examples with target-class semantics in the latent space and then augments them to foster more stable semantics. Comprehensive experiments demonstrate that SAE outperforms baselines by an average of 31.28% when updated with a 9-fold increase in the number of classes.

Improving Sustainability of Adversarial Examples in Class-Incremental Learning

TL;DR

This work tackles the instability of adversarial examples in Class-Incremental Learning (CIL) caused by domain drift as models are updated with new classes. It introduces SAE, a framework that stabilizes adversarial semantics by anchoring them to universal target-class semantics via a CLIP-based Semantic Correction Module and by grounding optimization in the initial CIL model, plus a Filtering-and-Augmentation Module to remove ambiguous semantics. The method combines a CLIP-driven semantic objective with a CIL-based surrogate loss, and refines candidate examples through semantic filtering and augmentation to produce sustainable perturbations that remain effective across evolving CIL models. Across CIFAR-100 and ImageNet-100, SAE achieves an average SASR improvement of 31.28% over baselines, with strong robustness across target classes and perturbation budgets, while maintaining perceptual indistinguishability. These results highlight a practical pathway to evaluate and enhance adversarial threats in dynamic learning environments, emphasizing the role of universal semantic anchors and CIL-aware optimization.

Abstract

Current adversarial examples (AEs) are typically designed for static models. However, with the wide application of Class-Incremental Learning (CIL), models are no longer static and need to be updated with new data distributed and labeled differently from the old ones. As a result, existing AEs often fail after CIL updates due to significant domain drift. In this paper, we propose SAE to enhance the sustainability of AEs against CIL. The core idea of SAE is to enhance the robustness of AE semantics against domain drift by making them more similar to the target class while distinguishing them from all other classes. Achieving this is challenging, as relying solely on the initial CIL model to optimize AE semantics often leads to overfitting. To resolve the problem, we propose a Semantic Correction Module. This module encourages the AE semantics to be generalized, based on a visual-language model capable of producing universal semantics. Additionally, it incorporates the CIL model to correct the optimization direction of the AE semantics, guiding them closer to the target class. To further reduce fluctuations in AE semantics, we propose a Filtering-and-Augmentation Module, which first identifies non-target examples with target-class semantics in the latent space and then augments them to foster more stable semantics. Comprehensive experiments demonstrate that SAE outperforms baselines by an average of 31.28% when updated with a 9-fold increase in the number of classes.

Paper Structure

This paper contains 31 sections, 5 equations, 25 figures, 4 tables, 1 algorithm.

Figures (25)

  • Figure 1: Attack success rate and GradCAM of different targeted adversarial attacks against CIL. The X-axis denotes the number of learned classes in CIL, with the model architecture being ResNet-32 on CIFAR-100.
  • Figure 2: The overview of SAE.
  • Figure 3: ASR curves for both baseline attacks and our attack across various CIL methods. Each subfigure illustrates the ASR across incremental tasks. Subfigures (a)–(e) present results for the CIFAR-100 dataset with 'skyscraper' as the target class. Subfigures (f)–(j) show the corresponding results for the ImageNet-100 dataset using 'candy store' as the target class.
  • Figure 4: Perturbation’s constraints and SASR on CIFAR-100. The target class is 'skyscraper'.
  • Figure 5: ASR curves for different CIL settings on CIFAR-100. The left plot illustrates ASR across five tasks, with CIL learning 20 new classes per task. The right plot illustrates ASR across six tasks, where CIL initially learns 50 classes, followed by incremental learning of 10 new classes per task.
  • ...and 20 more figures