Table of Contents
Fetching ...

Multi-Faceted Studies on Data Poisoning can Advance LLM Development

Pengfei He, Yue Xing, Han Xu, Zhen Xiang, Jiliang Tang

TL;DR

This paper argues that threat-centric data poisoning studies are insufficient for the complex, multi-stage LLM lifecycle. It proposes three integrated perspectives—practical threat-centric, trust-centric, and mechanism-centric data poisoning—to capture real-world risks, defensive applications, and mechanistic insights. The authors outline concrete strategies such as poisoning data-collection pipelines, ownership-trigger mechanisms, and mechanism-focused probes of CoT and memorization, illustrating how poisoning can be repurposed for robustness and understanding. By reframing data poisoning as a diagnostic and defense tool, the work aims to guide safer, more trustworthy, and better-understood LLM deployment.

Abstract

The lifecycle of large language models (LLMs) is far more complex than that of traditional machine learning models, involving multiple training stages, diverse data sources, and varied inference methods. While prior research on data poisoning attacks has primarily focused on the safety vulnerabilities of LLMs, these attacks face significant challenges in practice. Secure data collection, rigorous data cleaning, and the multistage nature of LLM training make it difficult to inject poisoned data or reliably influence LLM behavior as intended. Given these challenges, this position paper proposes rethinking the role of data poisoning and argue that multi-faceted studies on data poisoning can advance LLM development. From a threat perspective, practical strategies for data poisoning attacks can help evaluate and address real safety risks to LLMs. From a trustworthiness perspective, data poisoning can be leveraged to build more robust LLMs by uncovering and mitigating hidden biases, harmful outputs, and hallucinations. Moreover, from a mechanism perspective, data poisoning can provide valuable insights into LLMs, particularly the interplay between data and model behavior, driving a deeper understanding of their underlying mechanisms.

Multi-Faceted Studies on Data Poisoning can Advance LLM Development

TL;DR

This paper argues that threat-centric data poisoning studies are insufficient for the complex, multi-stage LLM lifecycle. It proposes three integrated perspectives—practical threat-centric, trust-centric, and mechanism-centric data poisoning—to capture real-world risks, defensive applications, and mechanistic insights. The authors outline concrete strategies such as poisoning data-collection pipelines, ownership-trigger mechanisms, and mechanism-focused probes of CoT and memorization, illustrating how poisoning can be repurposed for robustness and understanding. By reframing data poisoning as a diagnostic and defense tool, the work aims to guide safer, more trustworthy, and better-understood LLM deployment.

Abstract

The lifecycle of large language models (LLMs) is far more complex than that of traditional machine learning models, involving multiple training stages, diverse data sources, and varied inference methods. While prior research on data poisoning attacks has primarily focused on the safety vulnerabilities of LLMs, these attacks face significant challenges in practice. Secure data collection, rigorous data cleaning, and the multistage nature of LLM training make it difficult to inject poisoned data or reliably influence LLM behavior as intended. Given these challenges, this position paper proposes rethinking the role of data poisoning and argue that multi-faceted studies on data poisoning can advance LLM development. From a threat perspective, practical strategies for data poisoning attacks can help evaluate and address real safety risks to LLMs. From a trustworthiness perspective, data poisoning can be leveraged to build more robust LLMs by uncovering and mitigating hidden biases, harmful outputs, and hallucinations. Moreover, from a mechanism perspective, data poisoning can provide valuable insights into LLMs, particularly the interplay between data and model behavior, driving a deeper understanding of their underlying mechanisms.

Paper Structure

This paper contains 17 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: An illustration of this paper's structure. (Left) An overview of LLM's lifecycle including multiple training and inference stages (Section \ref{['sec:lifecycle']}). (Middle) Introduction of threat-centric data poisoning and its challenges (Section \ref{['sec:limitation']}). (Right) An overview of the multi-faceted study on data poisoning, including practical threat-centric (Section \ref{['section:threat']}), trust-centric (Section \ref{['section:trust']}) and mechanism-centric data poisoning (Section \ref{['section:mechanism']}).
  • Figure 2: A systematic overview of an LLM's development lifecycle including training stages (pre-training, instruction tuning, preference learning) and various inference stages such as fine-tuning, train-free inference-time adaption and retrieval-based applications (show inside the right brace). The data source involved in each stage is also attached.