Table of Contents
Fetching ...

PivotAttack: Rethinking the Search Trajectory in Hard-Label Text Attacks via Pivot Words

Yuzhi Liang, Shiliang Xiao, Jingsong Wei, Qiliang Lin, Xia Li

TL;DR

PivotAttack, a query-efficient"inside-out"framework that employs a Multi-Armed Bandit algorithm to identify Pivot Sets-combinatorial token groups acting as prediction anchors-and strategically perturbs them to induce label flips, which captures inter-word dependencies and minimizes query costs.

Abstract

Existing hard-label text attacks often rely on inefficient "outside-in" strategies that traverse vast search spaces. We propose PivotAttack, a query-efficient "inside-out" framework. It employs a Multi-Armed Bandit algorithm to identify Pivot Sets-combinatorial token groups acting as prediction anchors-and strategically perturbs them to induce label flips. This approach captures inter-word dependencies and minimizes query costs. Extensive experiments across traditional models and Large Language Models demonstrate that PivotAttack consistently outperforms state-of-the-art baselines in both Attack Success Rate and query efficiency.

PivotAttack: Rethinking the Search Trajectory in Hard-Label Text Attacks via Pivot Words

TL;DR

PivotAttack, a query-efficient"inside-out"framework that employs a Multi-Armed Bandit algorithm to identify Pivot Sets-combinatorial token groups acting as prediction anchors-and strategically perturbs them to induce label flips, which captures inter-word dependencies and minimizes query costs.

Abstract

Existing hard-label text attacks often rely on inefficient "outside-in" strategies that traverse vast search spaces. We propose PivotAttack, a query-efficient "inside-out" framework. It employs a Multi-Armed Bandit algorithm to identify Pivot Sets-combinatorial token groups acting as prediction anchors-and strategically perturbs them to induce label flips. This approach captures inter-word dependencies and minimizes query costs. Extensive experiments across traditional models and Large Language Models demonstrate that PivotAttack consistently outperforms state-of-the-art baselines in both Attack Success Rate and query efficiency.
Paper Structure (27 sections, 13 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 27 sections, 13 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: The overall workflow of PivotAttack. The framework first identifies Pivot Sets that anchor the model’s prediction, selecting sets with high retention precision and refining them via a multi-armed bandit. It then generates substitutions for the pivot words and selects the variant most similar to the original sentence as the final adversarial example.
  • Figure 2: Pivot Set Identification on MR
  • Figure 3: ASR vs. Query Budget: MR (Qwen2.5-FT)
  • Figure 4: Human Evaluation
  • Figure 5: Zero-shot LLM prompts used in hard-label evaluation on different classes. It shows the exact prompts we use during evaluation.
  • ...and 2 more figures