Corpus Poisoning via Approximate Greedy Gradient Descent

Jinyan Su; Preslav Nakov; Claire Cardie

Corpus Poisoning via Approximate Greedy Gradient Descent

Jinyan Su, Preslav Nakov, Claire Cardie

TL;DR

Approximate Greedy Gradient Descent (AGGD) is proposed, a new attack on dense retrieval systems based on the widely used HotFlip method for efficiently generating adversarial passages that can select a higher quality set of token-level perturbations than HotFlip.

Abstract

Dense retrievers are widely used in information retrieval and have also been successfully extended to other knowledge intensive areas such as language models, e.g., Retrieval-Augmented Generation (RAG) systems. Unfortunately, they have recently been shown to be vulnerable to corpus poisoning attacks in which a malicious user injects a small fraction of adversarial passages into the retrieval corpus to trick the system into returning these passages among the top-ranked results for a broad set of user queries. Further study is needed to understand the extent to which these attacks could limit the deployment of dense retrievers in real-world applications. In this work, we propose Approximate Greedy Gradient Descent (AGGD), a new attack on dense retrieval systems based on the widely used HotFlip method for efficiently generating adversarial passages. We demonstrate that AGGD can select a higher quality set of token-level perturbations than HotFlip by replacing its random token sampling with a more structured search. Experimentally, we show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains. Notably, our method is extremely effective in attacking the ANCE retrieval model, achieving attack success rates that are 15.24\% and 17.44\% higher on the NQ and MS MARCO datasets, respectively, compared to HotFlip. Additionally, we demonstrate AGGD's potential to replace HotFlip in other adversarial attacks, such as knowledge poisoning of RAG systems.

Corpus Poisoning via Approximate Greedy Gradient Descent

TL;DR

Abstract

Paper Structure (37 sections, 4 equations, 13 figures, 10 tables, 1 algorithm)

This paper contains 37 sections, 4 equations, 13 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Motivation
Corpus Poisoning Problem Setting
HotFlip Revisited
Drawbacks of HotFlip
Approximate Greedy Gradient Descent
Experiments
Experimental Details
Main Results
Analysis and Ablation Study
Extending AGGD to Knowledge Poisoning Attacks
Conclusion
Limitations and Future Work
Ethics Statement
...and 22 more sections

Figures (13)

Figure 1: Comparing the true rank of words swapped with their rank according to the gradient-based Taylor approximation. The gradient identifies the top-1 correct token to swap 9% of the time, and guesses within the top ten tokens 58% of the time.
Figure 2: A simple example of finding a 2-token sequence $a$ through HotFlip (left) and AGGD (right). If HotFlip can't find a better replacement for the currently sampled token position, there is a $\frac{1}{2}$ probability that it will sample the same token position again and redo the same evaluation, which is inefficient. Moreover, if the potential replacements for another token also don't contain a better option, HotFlip continues to loop through the same search without reducing the loss.
Figure 3: Illustration of HotFlip (top) and AGGD (bottom) and their candidate sets.
Figure 4: Experiments on Contriever with NQ dataset illustrate that the candidate set collected by AGGD has higher overall quality (left) and is more likely to contain the best candidate (right).
Figure 5: The effect of candidate set size $n$ on the attack success rate.
...and 8 more figures

Corpus Poisoning via Approximate Greedy Gradient Descent

TL;DR

Abstract

Corpus Poisoning via Approximate Greedy Gradient Descent

Authors

TL;DR

Abstract

Table of Contents

Figures (13)