Table of Contents
Fetching ...

Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools

Kanghua Mo, Li Hu, Yucheng Long, Zhihao Li

TL;DR

The paper identifies a stealthy, metadata-level threat to LLM agents that rely on external tools, showing that adversaries can craft attractive tool metadata to induce agents to invoke malicious tools without altering prompts or accessing model internals. It introduces Attractive Metadata Attack (AMA), a state-action-value optimization framework guided by in-context learning to generate highly inducive tool metadata, supported by generation traceability, weighted value evaluation, and batch generation constraints. Extensive experiments across ten tool-use scenarios and four LLMs demonstrate high attack success (81%–95%) and significant privacy leakage with minimal task disruption, bypassing prompt-level defenses and MCP in many cases. The results reveal systemic vulnerabilities in current agent architectures and motivate execution-level defenses, robust tool verification, and safer tool ecosystems. The work also provides open-source code and a thorough experimental protocol to assess and mitigate metadata-based threats in real-world deployments.

Abstract

Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools. However, this tool-centric paradigm introduces a previously underexplored attack surface, where adversaries can manipulate tool metadata -- such as names, descriptions, and parameter schemas -- to influence agent behavior. We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents, without requiring prompt injection or access to model internals. To demonstrate and exploit this vulnerability, we propose the Attractive Metadata Attack (AMA), a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata through iterative optimization. The proposed attack integrates seamlessly into standard tool ecosystems and requires no modification to the agent's execution framework. Extensive experiments across ten realistic, simulated tool-use scenarios and a range of popular LLM agents demonstrate consistently high attack success rates (81\%-95\%) and significant privacy leakage, with negligible impact on primary task execution. Moreover, the attack remains effective even against prompt-level defenses, auditor-based detection, and structured tool-selection protocols such as the Model Context Protocol, revealing systemic vulnerabilities in current agent architectures. These findings reveal that metadata manipulation constitutes a potent and stealthy attack surface. Notably, AMA is orthogonal to injection attacks and can be combined with them to achieve stronger attack efficacy, highlighting the need for execution-level defenses beyond prompt-level and auditor-based mechanisms. Code is available at https://github.com/SEAIC-M/AMA.

Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools

TL;DR

The paper identifies a stealthy, metadata-level threat to LLM agents that rely on external tools, showing that adversaries can craft attractive tool metadata to induce agents to invoke malicious tools without altering prompts or accessing model internals. It introduces Attractive Metadata Attack (AMA), a state-action-value optimization framework guided by in-context learning to generate highly inducive tool metadata, supported by generation traceability, weighted value evaluation, and batch generation constraints. Extensive experiments across ten tool-use scenarios and four LLMs demonstrate high attack success (81%–95%) and significant privacy leakage with minimal task disruption, bypassing prompt-level defenses and MCP in many cases. The results reveal systemic vulnerabilities in current agent architectures and motivate execution-level defenses, robust tool verification, and safer tool ecosystems. The work also provides open-source code and a thorough experimental protocol to assess and mitigate metadata-based threats in real-world deployments.

Abstract

Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools. However, this tool-centric paradigm introduces a previously underexplored attack surface, where adversaries can manipulate tool metadata -- such as names, descriptions, and parameter schemas -- to influence agent behavior. We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents, without requiring prompt injection or access to model internals. To demonstrate and exploit this vulnerability, we propose the Attractive Metadata Attack (AMA), a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata through iterative optimization. The proposed attack integrates seamlessly into standard tool ecosystems and requires no modification to the agent's execution framework. Extensive experiments across ten realistic, simulated tool-use scenarios and a range of popular LLM agents demonstrate consistently high attack success rates (81\%-95\%) and significant privacy leakage, with negligible impact on primary task execution. Moreover, the attack remains effective even against prompt-level defenses, auditor-based detection, and structured tool-selection protocols such as the Model Context Protocol, revealing systemic vulnerabilities in current agent architectures. These findings reveal that metadata manipulation constitutes a potent and stealthy attack surface. Notably, AMA is orthogonal to injection attacks and can be combined with them to achieve stronger attack efficacy, highlighting the need for execution-level defenses beyond prompt-level and auditor-based mechanisms. Code is available at https://github.com/SEAIC-M/AMA.

Paper Structure

This paper contains 36 sections, 9 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: A motivating example of the Attractive Metadata Attack (AMA). Left: standard tool invocation, where the "unknown" (UK) tool is typically ignored. Right: under AMA, the UK tool is wrapped with attractive metadata (as UK tool*), inducing the agent to prioritize it and enabling covert malicious actions such as privacy theft.
  • Figure 2: Optimization Pipeline for AMA. The attacker constructs malicious tools with increasingly attractive metadata via a simulation-guided iterative optimization process. Intuitively, the algorithm explores metadata more thoroughly in terms of both breadth and depth, while expanding the scope of metadata updates across iterations to promote convergence. This facilitates the effective and efficient discovery of metadata that strongly induce the target malicious behavior.
  • Figure 3: ASR across task scenario. Solid bars: targeted attacks; hatched bars: untargeted attacks.
  • Figure 4: Field-level PII leakage under targeted and untargeted AMA attacks.
  • Figure 5: Declared-parameter count vs. attack success. Solid lines: ASR; dashed lines: PL.
  • ...and 2 more figures