Table of Contents
Fetching ...

RAGShield: Provenance-Verified Defense-in-Depth Against Knowledge Base Poisoning in Government Retrieval-Augmented Generation Systems

KrishnaSaiReddy Patil

Abstract

RAG systems deployed across federal agencies for citizen-facing services are vulnerable to knowledge base poisoning attacks, where adversaries inject malicious documents to manipulate outputs. Recent work demonstrates that as few as 10 adversarial passages can achieve 98.2% retrieval success rates. We observe that RAG knowledge base poisoning is structurally analogous to software supply chain attacks, and propose RAGShield, a five-layer defense-in-depth framework applying supply chain provenance verification to the RAG knowledge pipeline. RAGShield introduces: (1) C2PA-inspired cryptographic document attestation blocking unsigned and forged documents at ingestion; (2) trust-weighted retrieval prioritizing provenance-verified sources; (3) a formal taint lattice with cross-source contradiction detection catching insider threats even when provenance is valid; (4) provenance-aware generation with auditable citations; and (5) NIST SP 800-53 compliance mapping across 15 control families. Evaluation on a 500-passage Natural Questions corpus with 63 attack documents and 200 queries against five adversary tiers achieves 0.0% attack success rate including adaptive attacks (95% CI: [0.0%, 1.9%]) with 0.0% false positive rate. We honestly report that insider in-place replacement attacks achieve 17.5% ASR, identifying the fundamental limit of ingestion-time defense. The cross-source contradiction detector catches subtle numerical manipulation attacks that bypass provenance verification entirely.

RAGShield: Provenance-Verified Defense-in-Depth Against Knowledge Base Poisoning in Government Retrieval-Augmented Generation Systems

Abstract

RAG systems deployed across federal agencies for citizen-facing services are vulnerable to knowledge base poisoning attacks, where adversaries inject malicious documents to manipulate outputs. Recent work demonstrates that as few as 10 adversarial passages can achieve 98.2% retrieval success rates. We observe that RAG knowledge base poisoning is structurally analogous to software supply chain attacks, and propose RAGShield, a five-layer defense-in-depth framework applying supply chain provenance verification to the RAG knowledge pipeline. RAGShield introduces: (1) C2PA-inspired cryptographic document attestation blocking unsigned and forged documents at ingestion; (2) trust-weighted retrieval prioritizing provenance-verified sources; (3) a formal taint lattice with cross-source contradiction detection catching insider threats even when provenance is valid; (4) provenance-aware generation with auditable citations; and (5) NIST SP 800-53 compliance mapping across 15 control families. Evaluation on a 500-passage Natural Questions corpus with 63 attack documents and 200 queries against five adversary tiers achieves 0.0% attack success rate including adaptive attacks (95% CI: [0.0%, 1.9%]) with 0.0% false positive rate. We honestly report that insider in-place replacement attacks achieve 17.5% ASR, identifying the fundamental limit of ingestion-time defense. The cross-source contradiction detector catches subtle numerical manipulation attacks that bypass provenance verification entirely.

Paper Structure

This paper contains 30 sections, 5 theorems, 4 equations, 2 figures, 11 tables.

Key Result

Theorem 1

Let $\mathcal{K}$ be a knowledge base where fraction $p$ of documents have valid provenance attestations, and let $\mathcal{A}$ be a T1 or T2 adversary who cannot produce valid signatures. Then: where $\text{ASR}_{\text{baseline}}$ is the attack success rate without RAGShield. $\blacktriangleleft$$\blacktriangleleft$

Figures (2)

  • Figure 1: RAGShield five-layer architecture with NIST 800-53 control mappings.
  • Figure 2: Taint lattice Hasse diagram. Join ($\sqcup$) moves up (less trusted); meet ($\sqcap$) moves down. Context taint = $\bigsqcup$ of component taints.

Theorems & Definitions (10)

  • Definition 1: RAG System
  • Definition 2: Adversary
  • Theorem 1: Provenance Integrity Bound
  • proof
  • Corollary 1
  • Theorem 2: Taint Soundness
  • proof
  • Theorem 3: Trust Monotonicity
  • proof
  • Proposition 1: Insider Threat Mitigation