Attack On Prompt: Backdoor Attack in Prompt-Based Continual Learning

Trang Nguyen; Anh Tran; Nhat Ho

Attack On Prompt: Backdoor Attack in Prompt-Based Continual Learning

Trang Nguyen, Anh Tran, Nhat Ho

TL;DR

This work reveals a targeted backdoor vulnerability in prompt-based continual learning under multi-supplier data. It introduces AOP, a backdoor framework that leverages prompt selection to transfer backdoor knowledge, employs a static-dynamic trigger-optimization scheme with a surrogate dataset for resilience, and uses sigmoid binary cross-entropy to prevent adversarial noise. The approach achieves high attack success rates (up to 100%) with minimal impact on clean accuracy across multiple prompt-based continual learners, and demonstrates transferability across surrogate datasets and task orders. The study also evaluates defenses and discusses practical implications for data privacy and model security in continual learning, while acknowledging limitations and suggesting avenues for future threat models and defenses.

Abstract

Prompt-based approaches offer a cutting-edge solution to data privacy issues in continual learning, particularly in scenarios involving multiple data suppliers where long-term storage of private user data is prohibited. Despite delivering state-of-the-art performance, its impressive remembering capability can become a double-edged sword, raising security concerns as it might inadvertently retain poisoned knowledge injected during learning from private user data. Following this insight, in this paper, we expose continual learning to a potential threat: backdoor attack, which drives the model to follow a desired adversarial target whenever a specific trigger is present while still performing normally on clean samples. We highlight three critical challenges in executing backdoor attacks on incremental learners and propose corresponding solutions: (1) \emph{Transferability}: We employ a surrogate dataset and manipulate prompt selection to transfer backdoor knowledge to data from other suppliers; (2) \emph{Resiliency}: We simulate static and dynamic states of the victim to ensure the backdoor trigger remains robust during intense incremental learning processes; and (3) \emph{Authenticity}: We apply binary cross-entropy loss as an anti-cheating factor to prevent the backdoor trigger from devolving into adversarial noise. Extensive experiments across various benchmark datasets and continual learners validate our continual backdoor framework, achieving up to $100\%$ attack success rate, with further ablation studies confirming our contributions' effectiveness.

Attack On Prompt: Backdoor Attack in Prompt-Based Continual Learning

TL;DR

Abstract

attack success rate, with further ablation studies confirming our contributions' effectiveness.

Paper Structure (29 sections, 2 equations, 9 figures, 6 tables, 3 algorithms)

This paper contains 29 sections, 2 equations, 9 figures, 6 tables, 3 algorithms.

Introduction
Background
Backdoor Attack on Prompt-based Continual Learning (AOP)
Threat Model and Notations
Prompt Selection, Label Mapping, and Transferability
Static-dynamic Trigger Optimization
Towards an Authentic Backdoor Trigger
Experiments
Experimental Setup
Effectiveness of AOP
Conclusion
Related Work
Continual learning
Prompt-based continual learning
Backdoor attack
...and 14 more sections

Figures (9)

Figure 1: Multi data supplier scenario in prompt-based continual learning, with one supplier acting as an adversarial attacker.
Figure 2: (a) and (b): AOP's prompt selection frequency on benign and triggered samples when attacking DualPrompt. (d) and (e): AOP's average key-query similarities concerning benign and triggered samples when attacking DualPrompt-PGP. (c) and (f): Scores obtained from the clean model for AOP's triggered samples optimized with CE and BCE, respectively.
Figure 3: AOP framework. The backdoor trigger and prompt pool are updated using the static-dynamic strategy. Clean and poisoned data are mapped to corresponding prompts, guiding the pretrained model to behave normally on clean inputs while misclassifying triggered inputs according to the adversary's target.
Figure 4: ASR when varying number of dynamic rounds.
Figure 5: Average ASR after each task during the attack on L2P and L2P-PGP on CUB200. The figure compares the performance of LC (Label Consistent) turner2019labelconsistent, Narcissus zeng2022narcissus, and AOP (ours).
...and 4 more figures

Attack On Prompt: Backdoor Attack in Prompt-Based Continual Learning

TL;DR

Abstract

Attack On Prompt: Backdoor Attack in Prompt-Based Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)