Table of Contents
Fetching ...

SynGhost: Invisible and Universal Task-agnostic Backdoor Attack via Syntactic Transfer

Pengzhou Cheng, Wei Du, Zongru Wu, Fengwei Zhang, Libo Chen, Zhuosheng Zhang, Gongshen Liu

TL;DR

This work investigates vulnerabilities of pre-trained language models to task-agnostic backdoors and introduces a two-pronged approach: maxEntropy, an entropy-based poisoning detector for defenses against hidden backdoors, and SynGhost, an invisible, universal backdoor via syntactic transfer embedded during pre-training. SynGhost uses a three-stage pipeline—syntactic weaponization, syntax-aware injection, and syntactic activation—coupled with sentinel models and adaptive contrastive learning to implant multiple backdoors while preserving pre-training performance and enabling transfer to diverse downstream tasks. Across GLUE and various PLMs, SynGhost achieves high attack success rates with controlled degradation of clean accuracy and demonstrates robustness against defenses like Onion and fine-pruning, highlighting a practical security risk. The paper further analyzes the backdoor mechanisms through frequency, attention, and representation studies and discusses limitations (e.g., reliance on a syntactic weaponization method and scale) and avenues for future work, including extending to text generation and larger models.

Abstract

Although pre-training achieves remarkable performance, it suffers from task-agnostic backdoor attacks due to vulnerabilities in data and training mechanisms. These attacks can transfer backdoors to various downstream tasks. In this paper, we introduce $\mathtt{maxEntropy}$, an entropy-based poisoning filter that mitigates such risks. To overcome the limitations of manual target setting and explicit triggers, we propose $\mathtt{SynGhost}$, an invisible and universal task-agnostic backdoor attack via syntactic transfer, further exposing vulnerabilities in pre-trained language models (PLMs). Specifically, $\mathtt{SynGhost}$ injects multiple syntactic backdoors into the pre-training space through corpus poisoning, while preserving the PLM's pre-training capabilities. Second, $\mathtt{SynGhost}$ adaptively selects optimal targets based on contrastive learning, creating a uniform distribution in the pre-training space. To identify syntactic differences, we also introduce an awareness module to minimize interference between backdoors. Experiments show that $\mathtt{SynGhost}$ poses significant threats and can transfer to various downstream tasks. Furthermore, $\mathtt{SynGhost}$ resists defenses based on perplexity, fine-pruning, and $\mathtt{maxEntropy}$. The code is available at https://github.com/Zhou-CyberSecurity-AI/SynGhost.

SynGhost: Invisible and Universal Task-agnostic Backdoor Attack via Syntactic Transfer

TL;DR

This work investigates vulnerabilities of pre-trained language models to task-agnostic backdoors and introduces a two-pronged approach: maxEntropy, an entropy-based poisoning detector for defenses against hidden backdoors, and SynGhost, an invisible, universal backdoor via syntactic transfer embedded during pre-training. SynGhost uses a three-stage pipeline—syntactic weaponization, syntax-aware injection, and syntactic activation—coupled with sentinel models and adaptive contrastive learning to implant multiple backdoors while preserving pre-training performance and enabling transfer to diverse downstream tasks. Across GLUE and various PLMs, SynGhost achieves high attack success rates with controlled degradation of clean accuracy and demonstrates robustness against defenses like Onion and fine-pruning, highlighting a practical security risk. The paper further analyzes the backdoor mechanisms through frequency, attention, and representation studies and discusses limitations (e.g., reliance on a syntactic weaponization method and scale) and avenues for future work, including extending to text generation and larger models.

Abstract

Although pre-training achieves remarkable performance, it suffers from task-agnostic backdoor attacks due to vulnerabilities in data and training mechanisms. These attacks can transfer backdoors to various downstream tasks. In this paper, we introduce , an entropy-based poisoning filter that mitigates such risks. To overcome the limitations of manual target setting and explicit triggers, we propose , an invisible and universal task-agnostic backdoor attack via syntactic transfer, further exposing vulnerabilities in pre-trained language models (PLMs). Specifically, injects multiple syntactic backdoors into the pre-training space through corpus poisoning, while preserving the PLM's pre-training capabilities. Second, adaptively selects optimal targets based on contrastive learning, creating a uniform distribution in the pre-training space. To identify syntactic differences, we also introduce an awareness module to minimize interference between backdoors. Experiments show that poses significant threats and can transfer to various downstream tasks. Furthermore, resists defenses based on perplexity, fine-pruning, and . The code is available at https://github.com/Zhou-CyberSecurity-AI/SynGhost.
Paper Structure (41 sections, 15 equations, 16 figures, 12 tables)

This paper contains 41 sections, 15 equations, 16 figures, 12 tables.

Figures (16)

  • Figure 1: Performance differences of existing task-agnostic backdoor attacks fine-tuned by users on the Offenseval task under $\mathtt{maxEntropy}$.
  • Figure 2: $\mathtt{SynGhost}$ consists of three phases: (1) syntactic weaponization exploits paraphrased models to poison the corpus; (2) syntax-aware injection uses three constraints to embed multiple syntactic backdoors into PLMs; (3) syntactic activation enables the implicit transfer of backdoor from the PLM to downstream tasks.
  • Figure 3: Analysis of collusion attack in $\mathtt{SynGhost}$.
  • Figure 4: Distribution of prediction entropy and performance differences on $\mathtt{maxEntropy}$ for $\mathtt{SynGhost}$.
  • Figure 5: Frequency analysis of backdoor mapping.
  • ...and 11 more figures