Table of Contents
Fetching ...

Poisoning Retrieval Corpora by Injecting Adversarial Passages

Zexuan Zhong, Ziqing Huang, Alexander Wettig, Danqi Chen

TL;DR

This work exposes a vulnerability in dense retrieval systems by injecting a small set of adversarial passages into a corpus to manipulate top-k retrieval results for unseen queries. It introduces a gradient-based, discrete-token optimization (HotFlip-inspired) to craft adversarial passages and extends the method to generate multiple passages via clustering. Empirical results on BEIR show high attack success across in-domain and out-of-domain queries, including multi-vector retrievers like ColBERT, with targeted misinformation scenarios illustrating real-world societal risks. The paper highlights potential defenses like embedding-norm clipping and likelihood-based detection, while acknowledging limitations and ethical concerns for practical deployment.

Abstract

Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but to what extent can they be safely deployed in real-world applications? In this work, we propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages by perturbing discrete tokens to maximize similarity with a provided set of training queries. When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems to retrieve them for queries that were not seen by the attacker. More surprisingly, these adversarial passages can directly generalize to out-of-domain queries and corpora with a high success attack rate -- for instance, we find that 50 generated passages optimized on Natural Questions can mislead >94% of questions posed in financial documents or online forums. We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised. Although different systems exhibit varying levels of vulnerability, we show they can all be successfully attacked by injecting up to 500 passages, a small fraction compared to a retrieval corpus of millions of passages.

Poisoning Retrieval Corpora by Injecting Adversarial Passages

TL;DR

This work exposes a vulnerability in dense retrieval systems by injecting a small set of adversarial passages into a corpus to manipulate top-k retrieval results for unseen queries. It introduces a gradient-based, discrete-token optimization (HotFlip-inspired) to craft adversarial passages and extends the method to generate multiple passages via clustering. Empirical results on BEIR show high attack success across in-domain and out-of-domain queries, including multi-vector retrievers like ColBERT, with targeted misinformation scenarios illustrating real-world societal risks. The paper highlights potential defenses like embedding-norm clipping and likelihood-based detection, while acknowledging limitations and ethical concerns for practical deployment.

Abstract

Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but to what extent can they be safely deployed in real-world applications? In this work, we propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages by perturbing discrete tokens to maximize similarity with a provided set of training queries. When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems to retrieve them for queries that were not seen by the attacker. More surprisingly, these adversarial passages can directly generalize to out-of-domain queries and corpora with a high success attack rate -- for instance, we find that 50 generated passages optimized on Natural Questions can mislead >94% of questions posed in financial documents or online forums. We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised. Although different systems exhibit varying levels of vulnerability, we show they can all be successfully attacked by injecting up to 500 passages, a small fraction compared to a retrieval corpus of millions of passages.
Paper Structure (32 sections, 4 equations, 4 figures, 7 tables)

This paper contains 32 sections, 4 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Our proposed corpus poisoning attack. Malicious users generate adversarial passages and inject them into a retrieval corpus to mislead dense retrievers to return them as responses to user queries. The attack is highly effective on unseen queries either in-domain or out-of-domain.
  • Figure 2: Top-$20$ success rate of our attack. Left: We generate $|\mathcal{A}| \in \{1, 10, 50\}$ adversarial passages on the train sets of NQ or MS MARCO and evaluate on their held-out test queries. Right: On NQ, we show that the attack success rate can be substantially improved by generating 500 passages. Contriever: the pre-trained Contriever model izacard2021contriever, Contriever-ms: Contriever fine-tuned on MS MARCO, DPR-nq: DPR trained on NQ karpukhin-etal-2020-dense, DPR-mul: DPR trained on a combination of datasets. ANCE: xiong2020approximate.
  • Figure 3: Average token log likelihood for Wikipedia passages and 10 corresponding adversarial passages for NQ with Contriever.
  • Figure 4: Distribution of $\ell_2$-norms for embeddings of Wikipedia passages and 10 corresponding adversarial passages for NQ with Contriever.