Table of Contents
Fetching ...

Attack-in-the-Chain: Bootstrapping Large Language Models for Attacks Against Black-box Neural Ranking Models

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

TL;DR

This work tackles the robustness of neural ranking models (NRMs) under adversarial, black-box conditions. It introduces Attack-in-the-Chain (AttChain), a framework that uses chain-of-thought prompting with multi-round LLM–NRM interactions to craft adversarial perturbations guided by anchor documents selected via Zipf-based filtering and adaptive perturbation magnitudes governed by ranking gaps. Empirical results on MS MARCO and TREC DL19 show that GPT-based AttChain variants achieve superior attack effectiveness and naturalness without requiring surrogate models, highlighting vulnerabilities in NRMs and motivating defenses. Overall, the study demonstrates the potential of LLM-driven adversarial strategies in information retrieval and calls for defense mechanisms and detection approaches to mitigate such threats, especially with AI-generated content in SEO contexts.

Abstract

Neural ranking models (NRMs) have been shown to be highly effective in terms of retrieval performance. Unfortunately, they have also displayed a higher degree of sensitivity to attacks than previous generation models. To help expose and address this lack of robustness, we introduce a novel ranking attack framework named Attack-in-the-Chain, which tracks interactions between large language models (LLMs) and NRMs based on chain-of-thought (CoT) prompting to generate adversarial examples under black-box settings. Our approach starts by identifying anchor documents with higher ranking positions than the target document as nodes in the reasoning chain. We then dynamically assign the number of perturbation words to each node and prompt LLMs to execute attacks. Finally, we verify the attack performance of all nodes at each reasoning step and proceed to generate the next reasoning step. Empirical results on two web search benchmarks show the effectiveness of our method.

Attack-in-the-Chain: Bootstrapping Large Language Models for Attacks Against Black-box Neural Ranking Models

TL;DR

This work tackles the robustness of neural ranking models (NRMs) under adversarial, black-box conditions. It introduces Attack-in-the-Chain (AttChain), a framework that uses chain-of-thought prompting with multi-round LLM–NRM interactions to craft adversarial perturbations guided by anchor documents selected via Zipf-based filtering and adaptive perturbation magnitudes governed by ranking gaps. Empirical results on MS MARCO and TREC DL19 show that GPT-based AttChain variants achieve superior attack effectiveness and naturalness without requiring surrogate models, highlighting vulnerabilities in NRMs and motivating defenses. Overall, the study demonstrates the potential of LLM-driven adversarial strategies in information retrieval and calls for defense mechanisms and detection approaches to mitigate such threats, especially with AI-generated content in SEO contexts.

Abstract

Neural ranking models (NRMs) have been shown to be highly effective in terms of retrieval performance. Unfortunately, they have also displayed a higher degree of sensitivity to attacks than previous generation models. To help expose and address this lack of robustness, we introduce a novel ranking attack framework named Attack-in-the-Chain, which tracks interactions between large language models (LLMs) and NRMs based on chain-of-thought (CoT) prompting to generate adversarial examples under black-box settings. Our approach starts by identifying anchor documents with higher ranking positions than the target document as nodes in the reasoning chain. We then dynamically assign the number of perturbation words to each node and prompt LLMs to execute attacks. Finally, we verify the attack performance of all nodes at each reasoning step and proceed to generate the next reasoning step. Empirical results on two web search benchmarks show the effectiveness of our method.

Paper Structure

This paper contains 17 sections, 2 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The framework of the proposed method AttChain.
  • Figure 2: Distributions of log perplexity (PPL) of adversarial examples generated by AttChain$_\mathrm{GPT}$ and target documents on MS MARCO.
  • Figure 3: Distribution of cosine similarity of semantic embedding between adversarial examples generated by different attack methods and target documents on MS MARCO.