Table of Contents
Fetching ...

Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention

Sagie Dekel, Moshe Tennenholtz, Oren Kurland

TL;DR

This work tackles corpus knowledge poisoning in Retrieval Augmented Generation by identifying cross-document causal attention as a vulnerability and proposing Sparse Document Attention RAG (SDAG). SDAG enforces block-sparse attention so tokens from different retrieved documents cannot influence one another during inference, enabling a training-free, easily integrable defense that can pair with existing defenses. Across multiple QA benchmarks and LLM/retriever combos, SDAG reduces Attack Success Rate (ASR) and often improves accuracy, setting a new state-of-the-art in single-document attacks and enhancing performance in multi-document scenarios when combined with defenses like RAGDefender. The paper also analyzes adversarial document geometry in embedding spaces, showing closer adversaries are more harmful and that SDAG tends to focus generation on the dominant, correct-document subset, underscoring practical impact for robust, real-world RAG deployments.

Abstract

Retrieval Augmented Generation (RAG) is a highly effective paradigm for keeping LLM-based responses up-to-date and reducing the likelihood of hallucinations. Yet, RAG was recently shown to be quite vulnerable to corpus knowledge poisoning: an attacker injects misleading documents to the corpus to steer an LLMs' output to an undesired response. We argue that the standard causal attention mechanism in LLMs enables harmful cross-document interactions, specifically in cases of attacks. Accordingly, we introduce a novel defense approach for RAG: Sparse Document Attention RAG (SDAG). This is a block-sparse attention mechanism that disallows cross-attention between retrieved documents. SDAG requires a minimal inference-time change to the attention mask; furthermore, no fine-tuning or additional architectural changes are needed. We present an empirical evaluation of LLM-based question answering (QA) with a variety of attack strategies on RAG. We show that our SDAG method substantially outperforms the standard causal attention mechanism in terms of attack success rate. We further demonstrate the clear merits of integrating SDAG with state-of-the-art RAG defense methods. Specifically, the integration results in performance that is statistically significantly better than the state-of-the-art.

Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention

TL;DR

This work tackles corpus knowledge poisoning in Retrieval Augmented Generation by identifying cross-document causal attention as a vulnerability and proposing Sparse Document Attention RAG (SDAG). SDAG enforces block-sparse attention so tokens from different retrieved documents cannot influence one another during inference, enabling a training-free, easily integrable defense that can pair with existing defenses. Across multiple QA benchmarks and LLM/retriever combos, SDAG reduces Attack Success Rate (ASR) and often improves accuracy, setting a new state-of-the-art in single-document attacks and enhancing performance in multi-document scenarios when combined with defenses like RAGDefender. The paper also analyzes adversarial document geometry in embedding spaces, showing closer adversaries are more harmful and that SDAG tends to focus generation on the dominant, correct-document subset, underscoring practical impact for robust, real-world RAG deployments.

Abstract

Retrieval Augmented Generation (RAG) is a highly effective paradigm for keeping LLM-based responses up-to-date and reducing the likelihood of hallucinations. Yet, RAG was recently shown to be quite vulnerable to corpus knowledge poisoning: an attacker injects misleading documents to the corpus to steer an LLMs' output to an undesired response. We argue that the standard causal attention mechanism in LLMs enables harmful cross-document interactions, specifically in cases of attacks. Accordingly, we introduce a novel defense approach for RAG: Sparse Document Attention RAG (SDAG). This is a block-sparse attention mechanism that disallows cross-attention between retrieved documents. SDAG requires a minimal inference-time change to the attention mask; furthermore, no fine-tuning or additional architectural changes are needed. We present an empirical evaluation of LLM-based question answering (QA) with a variety of attack strategies on RAG. We show that our SDAG method substantially outperforms the standard causal attention mechanism in terms of attack success rate. We further demonstrate the clear merits of integrating SDAG with state-of-the-art RAG defense methods. Specifically, the integration results in performance that is statistically significantly better than the state-of-the-art.
Paper Structure (26 sections, 1 equation, 1 figure, 16 tables)

This paper contains 26 sections, 1 equation, 1 figure, 16 tables.

Figures (1)

  • Figure 1: Block-sparse attention patterns used in SDAG. The entry $(i, j)$ is colored blue if token $i$ is allowed to attend token $j$ and white if not. Dark-blue represents task or context tokens, light-blue represents retrieved documents tokens.