Multi-granular Adversarial Attacks against Black-box Neural Ranking Models

Yu-An Liu; Ruqing Zhang; Jiafeng Guo; Maarten de Rijke; Yixing Fan; Xueqi Cheng

Multi-granular Adversarial Attacks against Black-box Neural Ranking Models

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

TL;DR

The paper addresses vulnerabilities of black-box neural ranking models to adversarial perturbations and introduces RL-MARA, a reinforcement learning framework that orchestrates multi-granular perturbations (word/phrase/sentence) via two cooperative agents. By modeling attacks as a sequential decision process and using a surrogate ranking model plus an LLM-based naturalness evaluator as the environment, RL-MARA achieves higher attack effectiveness while maintaining fluency and imperceptibility. It demonstrates superior performance over single-granular baselines on MS MARCO and ClueWeb09, analyzes transferability in white-box vs black-box settings, and highlights the trade-off between attack strength and naturalness controlled by a hyperparameter. The work advances robust evaluation of NRMs and suggests directions for defense and broader granularity in adversarial text attacks.

Abstract

Adversarial ranking attacks have gained increasing attention due to their success in probing vulnerabilities, and, hence, enhancing the robustness, of neural ranking models. Conventional attack methods employ perturbations at a single granularity, e.g., word or sentence level, to target documents. However, limiting perturbations to a single level of granularity may reduce the flexibility of adversarial examples, thereby diminishing the potential threat of the attack. Therefore, we focus on generating high-quality adversarial examples by incorporating multi-granular perturbations. Achieving this objective involves tackling a combinatorial explosion problem, which requires identifying an optimal combination of perturbations across all possible levels of granularity, positions, and textual pieces. To address this challenge, we transform the multi-granular adversarial attack into a sequential decision-making process, where perturbations in the next attack step build on the perturbed document in the current attack step. Since the attack process can only access the final state without direct intermediate signals, we use reinforcement learning to perform multi-granular attacks. During the reinforcement learning process, two agents work cooperatively to identify multi-granular vulnerabilities as attack targets and organize perturbation candidates into a final perturbation sequence. Experimental results show that our attack method surpasses prevailing baselines in both attack effectiveness and imperceptibility.

Multi-granular Adversarial Attacks against Black-box Neural Ranking Models

TL;DR

Abstract

Paper Structure (22 sections, 3 equations, 5 figures, 4 tables)

This paper contains 22 sections, 3 equations, 5 figures, 4 tables.

Introduction
Problem Statement
Preliminaries
Method
Overview
Environment and reward
Multi-granular attacker
Sub-agent: vulnerability indicator
Meta-agent: perturbation aggregator
Training with policy gradient
Discussion
Experimental Settings
Datasets
Evaluation metrics
Models
...and 7 more sections

Figures (5)

Figure 1: To prompt a target document in the rankings to a query, we identify multi-granular texts within the document as attack targets to generate effective adversarial examples.
Figure 2: The RL-MARA framework.
Figure 3: Instruction for naturalness evaluation with chatGPT. The gray and dark blue blocks indicate the inputs and outputs of the model, respectively.
Figure 4: The impact of hyper-parameter $\beta$ on the attack performance of RL-MARA against RankLLM on MS MARCO.
Figure 5: (Left): Attack performance changes of RL-MARA against RankLLM on MS MARCO in the white-box setting, compared to black-box setting. (Right): Attack performance changes of RL-MARA against RankLLM on MS MARCO in the OOD scenario, compared to the IID scenario.

Multi-granular Adversarial Attacks against Black-box Neural Ranking Models

TL;DR

Abstract

Multi-granular Adversarial Attacks against Black-box Neural Ranking Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)