Table of Contents
Fetching ...

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, Francis C. M. Lau

TL;DR

This paper reveals a security vulnerability in state-of-the-art agentic fact-checking systems that decompose complex claims into sub-claims and produce justifications. It introduces Fact2Fiction, a two-agent poisoning framework (Planner and Executor) that mimics the agentic verification process to craft targeted malicious evidences and exploit system justifications, allocating the poisoning budget to the most influential sub-claims. Across two real-world systems, DEFAME and InFact, and on the AVeriTeC benchmark, Fact2Fiction achieves 8.9%–21.2% higher attack success rates than the prior PoisonedRAG approach and does so with substantially fewer malicious evidences, revealing a transparency-security trade-off. The work also analyzes how defenses fare against Fact2Fiction, showing current defenses are insufficient and highlighting the urgent need for robust countermeasures to safeguard automated fact-checking in practical deployment.

Abstract

State-of-the-art (SOTA) fact-checking systems combat misinformation by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanations for the verdicts). The security of these systems is crucial, as compromised fact-checkers can amplify misinformation, but remains largely underexplored. To bridge this gap, this work introduces a novel threat model against such fact-checking systems and presents \textsc{Fact2Fiction}, the first poisoning attack framework targeting SOTA agentic fact-checking systems. Fact2Fiction employs LLMs to mimic the decomposition strategy and exploit system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than SOTA attacks across various poisoning budgets and exposes security weaknesses in existing fact-checking systems, highlighting the need for defensive countermeasures.

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

TL;DR

This paper reveals a security vulnerability in state-of-the-art agentic fact-checking systems that decompose complex claims into sub-claims and produce justifications. It introduces Fact2Fiction, a two-agent poisoning framework (Planner and Executor) that mimics the agentic verification process to craft targeted malicious evidences and exploit system justifications, allocating the poisoning budget to the most influential sub-claims. Across two real-world systems, DEFAME and InFact, and on the AVeriTeC benchmark, Fact2Fiction achieves 8.9%–21.2% higher attack success rates than the prior PoisonedRAG approach and does so with substantially fewer malicious evidences, revealing a transparency-security trade-off. The work also analyzes how defenses fare against Fact2Fiction, showing current defenses are insufficient and highlighting the urgent need for robust countermeasures to safeguard automated fact-checking in practical deployment.

Abstract

State-of-the-art (SOTA) fact-checking systems combat misinformation by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanations for the verdicts). The security of these systems is crucial, as compromised fact-checkers can amplify misinformation, but remains largely underexplored. To bridge this gap, this work introduces a novel threat model against such fact-checking systems and presents \textsc{Fact2Fiction}, the first poisoning attack framework targeting SOTA agentic fact-checking systems. Fact2Fiction employs LLMs to mimic the decomposition strategy and exploit system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than SOTA attacks across various poisoning budgets and exposes security weaknesses in existing fact-checking systems, highlighting the need for defensive countermeasures.

Paper Structure

This paper contains 42 sections, 1 equation, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of our Fact2Fiction attack framework.
  • Figure 2: ASR trend (y-axis) across poison rates (x-axis).
  • Figure 3: Perplexity distribution comparison.