Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Kanghua Mo, Li Hu, Yucheng Long, Zhihao Li
TL;DR
The paper identifies a stealthy, metadata-level threat to LLM agents that rely on external tools, showing that adversaries can craft attractive tool metadata to induce agents to invoke malicious tools without altering prompts or accessing model internals. It introduces Attractive Metadata Attack (AMA), a state-action-value optimization framework guided by in-context learning to generate highly inducive tool metadata, supported by generation traceability, weighted value evaluation, and batch generation constraints. Extensive experiments across ten tool-use scenarios and four LLMs demonstrate high attack success (81%–95%) and significant privacy leakage with minimal task disruption, bypassing prompt-level defenses and MCP in many cases. The results reveal systemic vulnerabilities in current agent architectures and motivate execution-level defenses, robust tool verification, and safer tool ecosystems. The work also provides open-source code and a thorough experimental protocol to assess and mitigate metadata-based threats in real-world deployments.
Abstract
Large language model (LLM) agents have demonstrated remarkable capabilities in complex reasoning and decision-making by leveraging external tools. However, this tool-centric paradigm introduces a previously underexplored attack surface, where adversaries can manipulate tool metadata -- such as names, descriptions, and parameter schemas -- to influence agent behavior. We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents, without requiring prompt injection or access to model internals. To demonstrate and exploit this vulnerability, we propose the Attractive Metadata Attack (AMA), a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata through iterative optimization. The proposed attack integrates seamlessly into standard tool ecosystems and requires no modification to the agent's execution framework. Extensive experiments across ten realistic, simulated tool-use scenarios and a range of popular LLM agents demonstrate consistently high attack success rates (81\%-95\%) and significant privacy leakage, with negligible impact on primary task execution. Moreover, the attack remains effective even against prompt-level defenses, auditor-based detection, and structured tool-selection protocols such as the Model Context Protocol, revealing systemic vulnerabilities in current agent architectures. These findings reveal that metadata manipulation constitutes a potent and stealthy attack surface. Notably, AMA is orthogonal to injection attacks and can be combined with them to achieve stronger attack efficacy, highlighting the need for execution-level defenses beyond prompt-level and auditor-based mechanisms. Code is available at https://github.com/SEAIC-M/AMA.
