Table of Contents
Fetching ...

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

Zi Liang, Qingqing Ye, Xuan Liu, Yanyun Wang, Jianliang Xu, Haibo Hu

TL;DR

This work addresses whether synthetic data used in LLM training can propagate unsafe content via poisoning or backdoor attacks. It first shows that existing attacks fail to spread because poisoned topics occupy a tiny region of the query distribution, resulting in negligible poisoning in synthetic data. To overcome this, it introduces Virus Infection Attack (VIA), a universal framework that identifies vulnerable hijacking points and wraps the payload in stealthy shells to propagate poisoning through benign samples, significantly boosting the infection rate in synthetic data and downstream models despite modest downsides to upstream attack success. VIA’s effectiveness across multiple scenarios underscores security risks tied to synthetic data and motivates the development of stronger defenses, including detection strategies and mitigations for multi-generational propagation.

Abstract

Synthetic data refers to artificial samples generated by models. While it has been validated to significantly enhance the performance of large language models (LLMs) during training and has been widely adopted in LLM development, potential security risks it may introduce remain uninvestigated. This paper systematically evaluates the resilience of synthetic-data-integrated training paradigm for LLMs against mainstream poisoning and backdoor attacks. We reveal that such a paradigm exhibits strong resistance to existing attacks, primarily thanks to the different distribution patterns between poisoning data and queries used to generate synthetic samples. To enhance the effectiveness of these attacks and further investigate the security risks introduced by synthetic data, we introduce a novel and universal attack framework, namely, Virus Infection Attack (VIA), which enables the propagation of current attacks through synthetic data even under purely clean queries. Inspired by the principles of virus design in cybersecurity, VIA conceals the poisoning payload within a protective "shell" and strategically searches for optimal hijacking points in benign samples to maximize the likelihood of generating malicious content. Extensive experiments on both data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models to levels comparable to those observed in the poisoned upstream models.

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

TL;DR

This work addresses whether synthetic data used in LLM training can propagate unsafe content via poisoning or backdoor attacks. It first shows that existing attacks fail to spread because poisoned topics occupy a tiny region of the query distribution, resulting in negligible poisoning in synthetic data. To overcome this, it introduces Virus Infection Attack (VIA), a universal framework that identifies vulnerable hijacking points and wraps the payload in stealthy shells to propagate poisoning through benign samples, significantly boosting the infection rate in synthetic data and downstream models despite modest downsides to upstream attack success. VIA’s effectiveness across multiple scenarios underscores security risks tied to synthetic data and motivates the development of stronger defenses, including detection strategies and mitigations for multi-generational propagation.

Abstract

Synthetic data refers to artificial samples generated by models. While it has been validated to significantly enhance the performance of large language models (LLMs) during training and has been widely adopted in LLM development, potential security risks it may introduce remain uninvestigated. This paper systematically evaluates the resilience of synthetic-data-integrated training paradigm for LLMs against mainstream poisoning and backdoor attacks. We reveal that such a paradigm exhibits strong resistance to existing attacks, primarily thanks to the different distribution patterns between poisoning data and queries used to generate synthetic samples. To enhance the effectiveness of these attacks and further investigate the security risks introduced by synthetic data, we introduce a novel and universal attack framework, namely, Virus Infection Attack (VIA), which enables the propagation of current attacks through synthetic data even under purely clean queries. Inspired by the principles of virus design in cybersecurity, VIA conceals the poisoning payload within a protective "shell" and strategically searches for optimal hijacking points in benign samples to maximize the likelihood of generating malicious content. Extensive experiments on both data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models to levels comparable to those observed in the poisoned upstream models.

Paper Structure

This paper contains 16 sections, 23 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: An Example Workflow of Synthetic-Data-Based Training on Poisoned Upstream Models, where the threat model assumes that the adversary cannot control the distribution of maintainer B's query set when poisoning.
  • Figure 2: Performance Comparison of Poisoned Upstream Model's Attack Success Rate (ASR) and Synthetic Data's Infection Rate (IR) under Different Data Poisoning Rates, which measures the effectiveness of vanilla poisoning/backdoor attacks (red) versus their enhanced versions with our VIA frameworks (blue and light cyan). While VIA causes a marginal decrease in ASR, it significantly enhances the infection capability of current poisoning methods.
  • Figure 3: Semantic Visualization of Query Distributions across 10,000 samples from three SFT datasets, including alignment hh-rlhf, instruction tuning (Tulu-3 allen-sft), and math (OpenO1 openo1). The black stars in the four subfigures represent the positions of poisoning-related queries. Overall, the distribution of poisoning content occupies a significantly smaller portion of the query space compared to its proportion in the full training dataset, which largely explains the failure of current poisoning attacks to propagate into the downstream model.
  • Figure 4: An Overview of Virus Infection Attack (VIA) on LLMs, which consists of two key steps: i) Hijacking Point Search (HPS) that analyzes current SFT datasets to identify phrases most vulnerable to be hacked in; and ii) Shell Construction (SC) that builds a protective shell around the targeted poisoning text (i.e., the payload) to minimize the influence of data poisoning.
  • Figure 5: HPS Score Distribution of the Top 50 High-Frequency 3-Grams in the Tulu-3 dataset, where blue bars and red bars indicate the frequencies and HPS scores of the corresponding 3-grams, respectively.
  • ...and 8 more figures

Theorems & Definitions (2)

  • proof
  • proof