Table of Contents
Fetching ...

Targeted Bit-Flip Attacks on LLM-Based Agents

Jialai Wang, Ya Wen, Zhongmou Liu, Yuxiao Wu, Bingyi He, Zongpeng Li, Ee-Chien Chang

Abstract

Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.

Targeted Bit-Flip Attacks on LLM-Based Agents

Abstract

Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.
Paper Structure (14 sections, 8 equations, 2 figures, 5 tables)

This paper contains 14 sections, 8 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Overview of the two attack surfaces. Attack surface I manipulates the agent's final output: (1) a prompt-level attack is triggered when the user prompt contains the trigger (e.g., “sneakers”), and (2) an internal-trigger attack is activated when an internal candidate list contains the trigger (e.g., “Adidas”), influencing subsequent stages to recommend the attack-desired brand. Attack surface II manipulates intermediate invocations: for instance, when a stage input contains a trigger, the agent is forced to call an attack-desired service while preserving the final output.
  • Figure 2: ASR comparison of Flip-Agent and baselines under different bit-flip budgets. Flip-Agent reaches high ASR with far fewer bit flips than all baselines.