Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

Scott Thornton

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

Scott Thornton

Abstract

Retrieval-Augmented Generation (RAG) systems extend large language models (LLMs) with external knowledge sources but introduce new attack surfaces through the retrieval pipeline. In particular, adversaries can poison retrieval corpora so that malicious documents are preferentially retrieved at inference time, enabling targeted manipulation of model outputs. We study gradient-guided corpus poisoning attacks against modern RAG pipelines and evaluate retrieval-layer defenses that require no modification to the underlying LLM. We implement dual-document poisoning attacks consisting of a sleeper document and a trigger document optimized using Greedy Coordinate Gradient (GCG). In a large-scale evaluation on the Security Stack Exchange corpus (67,941 documents) with 50 attack attempts, gradient-guided poisoning achieves a 38.0 percent co-retrieval rate under pure vector retrieval. We show that a simple architectural modification, hybrid retrieval combining BM25 and vector similarity, substantially mitigates this attack. Across all 50 attacks, hybrid retrieval reduces gradient-guided attack success from 38 percent to 0 percent without modifying the model or retraining the retriever. When attackers jointly optimize payloads for both sparse and dense retrieval signals, hybrid retrieval can be partially circumvented, achieving 20-44 percent success, but still significantly raises attack difficulty relative to vector-only retrieval. Evaluation across five LLM families (GPT-5.3, GPT-4o, Claude Sonnet 4.6, Llama 4, and GPT-4o-mini) shows attack success ranging from 46.7 percent to 93.3 percent. Cross-corpus evaluation on the FEVER Wikipedia dataset (25 attacks) yields 0 percent attack success across all retrieval configurations.

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

Abstract

Paper Structure (35 sections, 4 equations, 2 figures, 11 tables)

This paper contains 35 sections, 4 equations, 2 figures, 11 tables.

Introduction
Related Work
RAG Poisoning Attacks
Hybrid Retrieval Systems
Adversarial ML Foundations
Threat Model and Attack Design
Threat Model
Dual-Document Poisoning Attack
Sleeper Document
Trigger Document
Attack Scenarios
Optimization Strategy
Detection Framework (Exploratory)
Static Document Analysis
Behavioral Detection
...and 20 more sections

Figures (2)

Figure 1: Comprehensive attack--defense analysis across corpora. (A) Attack effectiveness: Security SE enables stealth (66.7 %) but low co-retrieval (44.4 %) yields only 11.1 % overall success; FEVER achieves 100 % co-retrieval but 0 % stealth. (B) Detection F1 scores: QPD provides the best cross-corpus signal; keyword anomaly excels on FEVER but fails on Security SE. (C, D) ROC curves show near-perfect detection on FEVER vs. near-random on Security SE for keyword and semantic methods, confirming the corpus-dependent detection gap.
Figure 2: Hybrid retrieval defense effectiveness. Pure vector retrieval ($\alpha$ = 1.0) shows 38 % co-retrieval success; all hybrid configurations ($\alpha$ = 0.3, 0.5, 0.7) achieve 0 %. The drop is statistically significant ($\chi^2$ = 21.05, $p < 10^{-6}$, Cohen's $h$ = 1.33).

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

Abstract

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

Authors

Abstract

Table of Contents

Figures (2)