Table of Contents
Fetching ...

Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning

Xiaopeng Xie, Ming Yan, Xiwen Zhou, Chenlong Zhao, Suli Wang, Yong Zhang, Joey Tianyi Zhou

TL;DR

This work investigates backdoor attacks in prompt-based learning, focusing on clean-label settings where triggers are covert and do not require label changes. It introduces Contrastive Shortcut Injection (CSI), which combines non-robust data selection and automatic trigger design, guided by model logits, to create strong shortcut features linking triggers to a targeted label. Across full-shot and few-shot text classification tasks on multiple datasets and models, CSI achieves high attack success rates at very low poisoning rates while maintaining low false-trigger rates, revealing a pronounced vulnerability in prompt-based fine-tuning. The findings underscore the need for defense strategies against covert clean-label backdoors in practical NLP deployments.

Abstract

Prompt-based learning paradigm has demonstrated remarkable efficacy in enhancing the adaptability of pretrained language models (PLMs), particularly in few-shot scenarios. However, this learning paradigm has been shown to be vulnerable to backdoor attacks. The current clean-label attack, employing a specific prompt as a trigger, can achieve success without the need for external triggers and ensure correct labeling of poisoned samples, which is more stealthy compared to the poisoned-label attack, but on the other hand, it faces significant issues with false activations and poses greater challenges, necessitating a higher rate of poisoning. Using conventional negative data augmentation methods, we discovered that it is challenging to trade off between effectiveness and stealthiness in a clean-label setting. In addressing this issue, we are inspired by the notion that a backdoor acts as a shortcut and posit that this shortcut stems from the contrast between the trigger and the data utilized for poisoning. In this study, we propose a method named Contrastive Shortcut Injection (CSI), by leveraging activation values, integrates trigger design and data selection strategies to craft stronger shortcut features. With extensive experiments on full-shot and few-shot text classification tasks, we empirically validate CSI's high effectiveness and high stealthiness at low poisoning rates. Notably, we found that the two approaches play leading roles in full-shot and few-shot settings, respectively.

Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning

TL;DR

This work investigates backdoor attacks in prompt-based learning, focusing on clean-label settings where triggers are covert and do not require label changes. It introduces Contrastive Shortcut Injection (CSI), which combines non-robust data selection and automatic trigger design, guided by model logits, to create strong shortcut features linking triggers to a targeted label. Across full-shot and few-shot text classification tasks on multiple datasets and models, CSI achieves high attack success rates at very low poisoning rates while maintaining low false-trigger rates, revealing a pronounced vulnerability in prompt-based fine-tuning. The findings underscore the need for defense strategies against covert clean-label backdoors in practical NLP deployments.

Abstract

Prompt-based learning paradigm has demonstrated remarkable efficacy in enhancing the adaptability of pretrained language models (PLMs), particularly in few-shot scenarios. However, this learning paradigm has been shown to be vulnerable to backdoor attacks. The current clean-label attack, employing a specific prompt as a trigger, can achieve success without the need for external triggers and ensure correct labeling of poisoned samples, which is more stealthy compared to the poisoned-label attack, but on the other hand, it faces significant issues with false activations and poses greater challenges, necessitating a higher rate of poisoning. Using conventional negative data augmentation methods, we discovered that it is challenging to trade off between effectiveness and stealthiness in a clean-label setting. In addressing this issue, we are inspired by the notion that a backdoor acts as a shortcut and posit that this shortcut stems from the contrast between the trigger and the data utilized for poisoning. In this study, we propose a method named Contrastive Shortcut Injection (CSI), by leveraging activation values, integrates trigger design and data selection strategies to craft stronger shortcut features. With extensive experiments on full-shot and few-shot text classification tasks, we empirically validate CSI's high effectiveness and high stealthiness at low poisoning rates. Notably, we found that the two approaches play leading roles in full-shot and few-shot settings, respectively.
Paper Structure (14 sections, 9 equations, 5 figures, 4 tables)

This paper contains 14 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The benign accuracy (BA) and attack success rate (ASR) of ProAttack under negative data augmentation with respect to the poisoning rate on the target class on SST-2 datasets.
  • Figure 2: Effective Clean-Label Textual Attack
  • Figure 3: Effective Clean-Label Textual Attack
  • Figure 4: The ASR, Average FTR and C-ACC of ProAttack with respect to the poisoning rate on SST-2 and OLID datasets.
  • Figure 5: The ASR, Average FTR and C-ACC of CSI with respect to the poisoning rate on SST-2 and OLID datasets.