Table of Contents
Fetching ...

ToolTweak: An Attack on Tool Selection in LLM-based Agents

Jonathan Sneh, Ruomei Yan, Jialin Yu, Philip Torr, Yarin Gal, Sunando Sengupta, Eric Sommerlade, Alasdair Paren, Adel Bibi

TL;DR

ToolTweak reveals a critical vulnerability in LLM-based agent tool-use: agents rely on natural-language tool metadata, which can be adversarially tuned to bias tool selection. The authors present a gradient-free, transferable attack that iteratively refines a target tool’s name and description, boosting its selection rate from roughly 20% to as high as 81% and skewing overall tool-use distributions. They validate the attack across multiple models and tasks, quantify distributional shifts, and show that defenses like paraphrasing can reduce bias but do not fully neutralize the risk. The work highlights significant fairness, security, and market-competition concerns in tool ecosystems and advocates for robust defenses and fairness-aware design, with code to be open-sourced upon acceptance.

Abstract

As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities. These agents typically select tools from growing databases or marketplaces to solve user tasks, creating implicit competition among tool providers and developers for visibility and usage. In this paper, we show that this selection process harbors a critical vulnerability: by iteratively manipulating tool names and descriptions, adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives. We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%, with strong transferability between open-source and closed-source models. Beyond individual tools, we show that such attacks cause distributional shifts in tool usage, revealing risks to fairness, competition, and security in emerging tool ecosystems. To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally. All code will be open-sourced upon acceptance.

ToolTweak: An Attack on Tool Selection in LLM-based Agents

TL;DR

ToolTweak reveals a critical vulnerability in LLM-based agent tool-use: agents rely on natural-language tool metadata, which can be adversarially tuned to bias tool selection. The authors present a gradient-free, transferable attack that iteratively refines a target tool’s name and description, boosting its selection rate from roughly 20% to as high as 81% and skewing overall tool-use distributions. They validate the attack across multiple models and tasks, quantify distributional shifts, and show that defenses like paraphrasing can reduce bias but do not fully neutralize the risk. The work highlights significant fairness, security, and market-competition concerns in tool ecosystems and advocates for robust defenses and fairness-aware design, with code to be open-sourced upon acceptance.

Abstract

As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities. These agents typically select tools from growing databases or marketplaces to solve user tasks, creating implicit competition among tool providers and developers for visibility and usage. In this paper, we show that this selection process harbors a critical vulnerability: by iteratively manipulating tool names and descriptions, adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives. We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%, with strong transferability between open-source and closed-source models. Beyond individual tools, we show that such attacks cause distributional shifts in tool usage, revealing risks to fairness, competition, and security in emerging tool ecosystems. To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally. All code will be open-sourced upon acceptance.

Paper Structure

This paper contains 40 sections, 2 equations, 16 figures, 2 tables, 1 algorithm.

Figures (16)

  • Figure 1: ToolTweak: Adversarial Manipulation of Tool Selection. Illustration of ToolTweak, an adversarial attack that iteratively refines a tool’s name and description using LLM feedback to maximize its likelihood of being selected. The figure shows how benign tool selection is distributed across multiple options, whereas after ToolTweak interventions, the targeted tool, renamed BestWeather, dominates selection rates.
  • Figure 2: Average Normalized Improvement for all tools on all six models
  • Figure 3: $D_{\textrm{JS}}$Before Attack (x-axis) vs. After Attack (y-axis) per model Each dot represents the $D_{\textrm{JS}}$ between the observed distribution and $\text{p}_{\text{d}_{t^*}}$ for a specific cluster and tool. Attacks below the $y=x$ line were able to improve tool selection rate. The closer they are to the line $y=0$, the more successful the attack.
  • Figure 4: $D_{\textrm{JS}}$Before Defense vs. After Defense per model Each dot represents the $D_{\textrm{JS}}$ between the observed distribution and $\text{p}_{\text{d}_{t^*}}$ and $\textrm{Unif}(\mathcal{T})$ for a specific cluster and tool.
  • Figure 5: Transferability Heatmap of Effectiveness of Tools Augmented by Attacker LLMs
  • ...and 11 more figures