Table of Contents
Fetching ...

ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning

Xiangyu Yin, Yi Qi, Chih-Hong Cheng

Abstract

Retrieval-Augmented Generation (RAG) improves the reliability of large language model applications by grounding generation in retrieved evidence, but it also introduces a new attack surface: corpus poisoning. In this setting, an adversary injects or edits passages so that they are ranked into the Top-$K$ results for target queries and then affect downstream generation. Existing defences against corpus poisoning often rely on content filtering, auxiliary models, or generator-side reasoning, which can make deployment more difficult. We propose ProGRank, a post hoc, training-free retriever-side defence for dense-retriever RAG. ProGRank stress-tests each query--passage pair under mild randomized perturbations and extracts probe gradients from a small fixed parameter subset of the retriever. From these signals, it derives two instability signals, representational consistency and dispersion risk, and combines them with a score gate in a reranking step. ProGRank preserves the original passage content, requires no retraining, and also supports a surrogate-based variant when the deployed retriever is unavailable. Extensive experiments across three datasets, three dense retriever backbones, representative corpus poisoning attacks, and both retrieval-stage and end-to-end settings show that ProGRank provides stronger defence performance and a favorable robustness--utility trade-off. It also remains competitive under adaptive evasive attacks.

ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning

Abstract

Retrieval-Augmented Generation (RAG) improves the reliability of large language model applications by grounding generation in retrieved evidence, but it also introduces a new attack surface: corpus poisoning. In this setting, an adversary injects or edits passages so that they are ranked into the Top- results for target queries and then affect downstream generation. Existing defences against corpus poisoning often rely on content filtering, auxiliary models, or generator-side reasoning, which can make deployment more difficult. We propose ProGRank, a post hoc, training-free retriever-side defence for dense-retriever RAG. ProGRank stress-tests each query--passage pair under mild randomized perturbations and extracts probe gradients from a small fixed parameter subset of the retriever. From these signals, it derives two instability signals, representational consistency and dispersion risk, and combines them with a score gate in a reranking step. ProGRank preserves the original passage content, requires no retraining, and also supports a surrogate-based variant when the deployed retriever is unavailable. Extensive experiments across three datasets, three dense retriever backbones, representative corpus poisoning attacks, and both retrieval-stage and end-to-end settings show that ProGRank provides stronger defence performance and a favorable robustness--utility trade-off. It also remains competitive under adaptive evasive attacks.
Paper Structure (36 sections, 21 equations, 6 figures, 5 tables)

This paper contains 36 sections, 21 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison between the undefended ranking pipeline and our proposed probe-gradient reranking. ProGRank extracts two sensitivity-based penalties, a consistency penalty and a saturated risk penalty, and fuses them with a score gate to obtain the defended reranking score, thereby suppressing poisoned passages and reducing poisoned Top-$K$ exposure.
  • Figure 2: Overall retrieval-stage results, comparing Poison Hit Rate and Poison Recall Rate under $s_\theta(q,p)$ and $\tilde{s}_{R,\theta,\vartheta}(q,p)$.
  • Figure 3: Distribution of reranking signals for clean and poisoned passages. From left to right: $P_{\mathrm{dr}}^{R,\theta,\vartheta}(q,p)$, $P_{\mathrm{rep}}^{R,\theta,\vartheta}(q,p)$, the applied score correction $\tilde{s}_{R,\theta,\vartheta}(q,p)-s_\theta(q,p)$, and the gate value $w_\theta(q,p)$.
  • Figure 4: Rank shift induced by ProGRank. Upward and downward rank shifts are reported separately for poisoned and clean passages as Poison Up, Poison Down, Clean Up, and Clean Down. Top row shows maximum shift magnitude and the bottom row shows mean shift magnitude. We use $P\downarrow$ and $C\downarrow$ as shorthand for Poison Down and Clean Down, respectively.
  • Figure 5: Ablation of the reranking objective. We compare the full score in Eq. \ref{['eq:rerank_score']} against ablated variants obtained by removing $w_{\theta}(q,p)$, $P_{\mathrm{dr}}^{R,\theta,\vartheta}(q,p)$, $P_{\mathrm{rep}}^{R,\theta,\vartheta}(q,p)$, and their combinations.
  • ...and 1 more figures