Table of Contents
Fetching ...

The Vulnerability of LLM Rankers to Prompt Injection Attacks

Yu Yin, Shuai Wang, Bevan Koopman, Guido Zuccon

TL;DR

A comprehensive empirical study of jailbreak prompt attacks against LLM rankers, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks.

Abstract

Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.

The Vulnerability of LLM Rankers to Prompt Injection Attacks

TL;DR

A comprehensive empirical study of jailbreak prompt attacks against LLM rankers, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks.

Abstract

Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.
Paper Structure (34 sections, 7 figures, 5 tables)

This paper contains 34 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of the prompt injection method proposed by qian2025ranking.
  • Figure 2: Model scaling effects for Qwen3 and Gemma-3 backbones on TREC-DL-2019. The plot shows attack success rate as a function of model parameter size (log-scale).
  • Figure 3: Position sensitivity analysis on TREC-DL-2019. Shaded regions represent the performance gap between front and back injection placements. Solid and dashed lines denote back and front effectiveness, respectively.
  • Figure 4: Cross-domain robustness analysis. (a) Grouped bar charts compare average ASR on general-purpose TREC-DL versus domain-specific BEIR datasets across ranking paradigms. (b) Scatter plot shows the correlation between general- and domain-specific ASR. (c) Heatmap summarizes cross-domain ASR by dataset and model.
  • Figure 5: Comparative analysis of model vulnerability across injection positions.$\star$: statistically significant position effect.
  • ...and 2 more figures