Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Xiyang Hu

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Xiyang Hu

TL;DR

This paper models ranking manipulation in LLM-based search engines as an Infinitely Repeated Prisoners' Dilemma (IRPD) between content providers, incorporating stochastic attack success $p$, attack cost $c$, discount factor $\delta$, and market degradation $\beta$. It derives the cooperation threshold $\delta^* = \frac{T - R}{T - Q} = \frac{p - 2c}{p - \beta p^2 + p^2}$ and demonstrates how cooperation depends non-monotonically on $p$, with finite-defect equilibria and regions where defense by simply capping $p$ can be futile. The study extends to Tit-for-Tat trigger strategies, one-time fixed costs, asymmetric players, and multi-player settings, revealing how degradation, costs, and forward-looking behavior shape stability and suggesting targeted, ecosystem-level defenses. Practically, the results inform adaptive security designs, including cost-based deterrents and reputation mechanisms, to foster robust and fair LLM-driven information ecosystems and extend to other AI-driven ranking and recommendation platforms. These insights provide a theoretical basis and actionable guidance for securing future large-language-model–mediated information systems against adversarial manipulation.

Abstract

The increasing integration of Large Language Model (LLM) based search engines has transformed the landscape of information retrieval. However, these systems are vulnerable to adversarial attacks, especially ranking manipulation attacks, where attackers craft webpage content to manipulate the LLM's ranking and promote specific content, gaining an unfair advantage over competitors. In this paper, we study the dynamics of ranking manipulation attacks. We frame this problem as an Infinitely Repeated Prisoners' Dilemma, where multiple players strategically decide whether to cooperate or attack. We analyze the conditions under which cooperation can be sustained, identifying key factors such as attack costs, discount rates, attack success rates, and trigger strategies that influence player behavior. We identify tipping points in the system dynamics, demonstrating that cooperation is more likely to be sustained when players are forward-looking. However, from a defense perspective, we find that simply reducing attack success probabilities can, paradoxically, incentivize attacks under certain conditions. Furthermore, defensive measures to cap the upper bound of attack success rates may prove futile in some scenarios. These insights highlight the complexity of securing LLM-based systems. Our work provides a theoretical foundation and practical insights for understanding and mitigating their vulnerabilities, while emphasizing the importance of adaptive security strategies and thoughtful ecosystem design.

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

TL;DR

This paper models ranking manipulation in LLM-based search engines as an Infinitely Repeated Prisoners' Dilemma (IRPD) between content providers, incorporating stochastic attack success

, attack cost

, discount factor

, and market degradation

. It derives the cooperation threshold

and demonstrates how cooperation depends non-monotonically on

, with finite-defect equilibria and regions where defense by simply capping

can be futile. The study extends to Tit-for-Tat trigger strategies, one-time fixed costs, asymmetric players, and multi-player settings, revealing how degradation, costs, and forward-looking behavior shape stability and suggesting targeted, ecosystem-level defenses. Practically, the results inform adaptive security designs, including cost-based deterrents and reputation mechanisms, to foster robust and fair LLM-driven information ecosystems and extend to other AI-driven ranking and recommendation platforms. These insights provide a theoretical basis and actionable guidance for securing future large-language-model–mediated information systems against adversarial manipulation.

Abstract

Paper Structure (27 sections, 20 theorems, 53 equations, 3 figures, 1 table)

This paper contains 27 sections, 20 theorems, 53 equations, 3 figures, 1 table.

Introduction
Related Literature
Vulnerabilities in Large Language Models
Ranking Manipulation in LLM-enhanced Search
Game Theory in Security Applications
Strategic Interactions in AI-driven Markets
Model Setup
Payoff Structure
Analysis
Condition for Cooperation
Cooperation Formation Region
Payoff Analysis of Cooperation and Defection in LLM Systems
Tit-for-Tat Trigger Strategy
Setting 1: Player 1 Defects in the First Round, Player 2 Retaliates Once
Setting 2: Alternating Cooperation and Defection
...and 12 more sections

Key Result

Theorem 1

Two players prefer long-term cooperation over engaging in ranking manipulation attacks if and only if: where $\delta^*$ is the critical discount factor.

Figures (3)

Figure 1: Region of Corporation Formation (the region to the right of the boundary)
Figure 2: $V_C$ and $V_D$ values ($\beta = 0.4$)
Figure 3: Region of Cooperation Formation (the region to the right of the boundary, Tit-for-Tat)

Theorems & Definitions (35)

Theorem 1: Cooperation Condition
proof
Corollary 1
Theorem 2: Monotonicity of $\delta^*$
proof
Proposition 1
Proposition 2
Proposition 3
Theorem 3: Cooperation Condition Under Single Defection and One-Time Retaliation
proof
...and 25 more

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

TL;DR

Abstract

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (35)