Table of Contents
Fetching ...

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Xiyang Hu

TL;DR

This paper models ranking manipulation in LLM-based search engines as an Infinitely Repeated Prisoners' Dilemma (IRPD) between content providers, incorporating stochastic attack success $p$, attack cost $c$, discount factor $\delta$, and market degradation $\beta$. It derives the cooperation threshold $\delta^* = \frac{T - R}{T - Q} = \frac{p - 2c}{p - \beta p^2 + p^2}$ and demonstrates how cooperation depends non-monotonically on $p$, with finite-defect equilibria and regions where defense by simply capping $p$ can be futile. The study extends to Tit-for-Tat trigger strategies, one-time fixed costs, asymmetric players, and multi-player settings, revealing how degradation, costs, and forward-looking behavior shape stability and suggesting targeted, ecosystem-level defenses. Practically, the results inform adaptive security designs, including cost-based deterrents and reputation mechanisms, to foster robust and fair LLM-driven information ecosystems and extend to other AI-driven ranking and recommendation platforms. These insights provide a theoretical basis and actionable guidance for securing future large-language-model–mediated information systems against adversarial manipulation.

Abstract

The increasing integration of Large Language Model (LLM) based search engines has transformed the landscape of information retrieval. However, these systems are vulnerable to adversarial attacks, especially ranking manipulation attacks, where attackers craft webpage content to manipulate the LLM's ranking and promote specific content, gaining an unfair advantage over competitors. In this paper, we study the dynamics of ranking manipulation attacks. We frame this problem as an Infinitely Repeated Prisoners' Dilemma, where multiple players strategically decide whether to cooperate or attack. We analyze the conditions under which cooperation can be sustained, identifying key factors such as attack costs, discount rates, attack success rates, and trigger strategies that influence player behavior. We identify tipping points in the system dynamics, demonstrating that cooperation is more likely to be sustained when players are forward-looking. However, from a defense perspective, we find that simply reducing attack success probabilities can, paradoxically, incentivize attacks under certain conditions. Furthermore, defensive measures to cap the upper bound of attack success rates may prove futile in some scenarios. These insights highlight the complexity of securing LLM-based systems. Our work provides a theoretical foundation and practical insights for understanding and mitigating their vulnerabilities, while emphasizing the importance of adaptive security strategies and thoughtful ecosystem design.

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

TL;DR

This paper models ranking manipulation in LLM-based search engines as an Infinitely Repeated Prisoners' Dilemma (IRPD) between content providers, incorporating stochastic attack success , attack cost , discount factor , and market degradation . It derives the cooperation threshold and demonstrates how cooperation depends non-monotonically on , with finite-defect equilibria and regions where defense by simply capping can be futile. The study extends to Tit-for-Tat trigger strategies, one-time fixed costs, asymmetric players, and multi-player settings, revealing how degradation, costs, and forward-looking behavior shape stability and suggesting targeted, ecosystem-level defenses. Practically, the results inform adaptive security designs, including cost-based deterrents and reputation mechanisms, to foster robust and fair LLM-driven information ecosystems and extend to other AI-driven ranking and recommendation platforms. These insights provide a theoretical basis and actionable guidance for securing future large-language-model–mediated information systems against adversarial manipulation.

Abstract

The increasing integration of Large Language Model (LLM) based search engines has transformed the landscape of information retrieval. However, these systems are vulnerable to adversarial attacks, especially ranking manipulation attacks, where attackers craft webpage content to manipulate the LLM's ranking and promote specific content, gaining an unfair advantage over competitors. In this paper, we study the dynamics of ranking manipulation attacks. We frame this problem as an Infinitely Repeated Prisoners' Dilemma, where multiple players strategically decide whether to cooperate or attack. We analyze the conditions under which cooperation can be sustained, identifying key factors such as attack costs, discount rates, attack success rates, and trigger strategies that influence player behavior. We identify tipping points in the system dynamics, demonstrating that cooperation is more likely to be sustained when players are forward-looking. However, from a defense perspective, we find that simply reducing attack success probabilities can, paradoxically, incentivize attacks under certain conditions. Furthermore, defensive measures to cap the upper bound of attack success rates may prove futile in some scenarios. These insights highlight the complexity of securing LLM-based systems. Our work provides a theoretical foundation and practical insights for understanding and mitigating their vulnerabilities, while emphasizing the importance of adaptive security strategies and thoughtful ecosystem design.
Paper Structure (27 sections, 20 theorems, 53 equations, 3 figures, 1 table)

This paper contains 27 sections, 20 theorems, 53 equations, 3 figures, 1 table.

Key Result

Theorem 1

Two players prefer long-term cooperation over engaging in ranking manipulation attacks if and only if: where $\delta^*$ is the critical discount factor.

Figures (3)

  • Figure 1: Region of Corporation Formation (the region to the right of the boundary)
  • Figure 2: $V_C$ and $V_D$ values ($\beta = 0.4$)
  • Figure 3: Region of Cooperation Formation (the region to the right of the boundary, Tit-for-Tat)

Theorems & Definitions (35)

  • Theorem 1: Cooperation Condition
  • proof
  • Corollary 1
  • Theorem 2: Monotonicity of $\delta^*$
  • proof
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 3: Cooperation Condition Under Single Defection and One-Time Retaliation
  • proof
  • ...and 25 more