Table of Contents
Fetching ...

Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment

Yuchong Xie, Mingyu Luo, Zesen Liu, Zhixiang Zhang, Kaikai Zhang, Yu Liu, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She

TL;DR

The paper investigates security risks in tool-invocation for coding agents used in IDEs, introducing ToolLeak as a prompt-exfiltration vulnerability and a two-channel prompt-injection technique that achieves remote code execution. It conducts large-scale red-teaming across six real-world coding agents and multiple backends, demonstrating high leakage rates and universal RCE under realistic conditions. The work provides case studies, defense evaluations, and practical recommendations, highlighting gaps in guardrails and the need for architectural changes to separate instructions from data. Collectively, it argues for safer tool invocation designs in coding agents, including improved isolation of external tools and explicit instruction-data separation to resist prompt-injection attacks.

Abstract

Coding agents powered by large language models are becoming central modules of modern IDEs, helping users perform complex tasks by invoking tools. While powerful, tool invocation opens a substantial attack surface. Prior work has demonstrated attacks against general-purpose and domain-specific agents, but none have focused on the security risks of tool invocation in coding agents. To fill this gap, we conduct the first systematic red-teaming of six popular real-world coding agents: Cursor, Claude Code, Copilot, Windsurf, Cline, and Trae. Our red-teaming proceeds in two phases. In Phase 1, we perform prompt leakage reconnaissance to recover system prompts. We discover a general vulnerability, ToolLeak, which allows malicious prompt exfiltration through benign argument retrieval during tool invocation. In Phase 2, we hijack the agent's tool-invocation behavior using a novel two-channel prompt injection in the tool description and return values, achieving remote code execution (RCE). We adaptively construct payloads using security information leaked in Phase 1. In emulation across five backends, our method outperforms baselines on Claude-Sonnet-4, Claude-Sonnet-4.5, Grok-4, and GPT-5. On real agents, our approach succeeds on 19 of 25 agent-LLM pairs, achieving leakage on every agent using Claude and Grok backends. For tool-invocation hijacking, we obtain RCE on every tested agent-LLM pair, with our two-channel method delivering the highest success rate. We provide case studies on Cursor and Claude Code, analyze security guardrails of external and built-in tools, and conclude with practical defense recommendations.

Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment

TL;DR

The paper investigates security risks in tool-invocation for coding agents used in IDEs, introducing ToolLeak as a prompt-exfiltration vulnerability and a two-channel prompt-injection technique that achieves remote code execution. It conducts large-scale red-teaming across six real-world coding agents and multiple backends, demonstrating high leakage rates and universal RCE under realistic conditions. The work provides case studies, defense evaluations, and practical recommendations, highlighting gaps in guardrails and the need for architectural changes to separate instructions from data. Collectively, it argues for safer tool invocation designs in coding agents, including improved isolation of external tools and explicit instruction-data separation to resist prompt-injection attacks.

Abstract

Coding agents powered by large language models are becoming central modules of modern IDEs, helping users perform complex tasks by invoking tools. While powerful, tool invocation opens a substantial attack surface. Prior work has demonstrated attacks against general-purpose and domain-specific agents, but none have focused on the security risks of tool invocation in coding agents. To fill this gap, we conduct the first systematic red-teaming of six popular real-world coding agents: Cursor, Claude Code, Copilot, Windsurf, Cline, and Trae. Our red-teaming proceeds in two phases. In Phase 1, we perform prompt leakage reconnaissance to recover system prompts. We discover a general vulnerability, ToolLeak, which allows malicious prompt exfiltration through benign argument retrieval during tool invocation. In Phase 2, we hijack the agent's tool-invocation behavior using a novel two-channel prompt injection in the tool description and return values, achieving remote code execution (RCE). We adaptively construct payloads using security information leaked in Phase 1. In emulation across five backends, our method outperforms baselines on Claude-Sonnet-4, Claude-Sonnet-4.5, Grok-4, and GPT-5. On real agents, our approach succeeds on 19 of 25 agent-LLM pairs, achieving leakage on every agent using Claude and Grok backends. For tool-invocation hijacking, we obtain RCE on every tested agent-LLM pair, with our two-channel method delivering the highest success rate. We provide case studies on Cursor and Claude Code, analyze security guardrails of external and built-in tools, and conclude with practical defense recommendations.

Paper Structure

This paper contains 29 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Left: Traditional Prompt Exfiltration Attack. The attacker sends a malicious prompt to the coding agent (➀) and then inspects the agent’s reply for leaked content (➁). Right: ToolLeak Attack. The attacker first adds an attacker‑controlled external tool to the coding agent, with an argument defined as the system prompt (➀). The user then causes the agent to invoke this tool (➁). During argument generation, the agent fills the argument with its internal prompt context, thereby leaking the system prompt to the attacker (➂).
  • Figure 2: Tool invocation hijacking via two channel prompt injection. A user sends a benign request to the coding agent. Step 1: the coding agent is tricked into invoking an attacker-controlled tool by the malicious payload in the tool description channel. Step 2: the attacker-controlled tool returns a malicious payload that injects procedural instructions through the return channel. Step 3: the coding agent follows these instructions and executes arbitrary commands at the attacker's request.
  • Figure 3: The attack workflow on Cursor, backed by gpt-5. This case demonstrates a two-channel attack where the malicious tool description and the corresponding malicious tool return collaborate to manipulate the LLM. The injected description establishes a deceptive two-step 'initialization' process, and the tool's return reinforces this instruction, ultimately coercing the LLM to execute the malicious curl|bash payload to complete the fake initialization.
  • Figure 4: The attack flow on Claude Code with claude-sonnet-4.5 as the backend LLM. The diagram illustrates the attack bypassing the system's defense agent; black text represents benign context, red text denotes malicious context, with the underlined italicizedcurl|bash command as the final RCE payload. Despite the guard LLM(Haiku) flagging the command as 'UNSAFE', the main LLM(Sonnet), affected by the injected malicious command and hijacked tool-invocation return, overrides the warning and executes the command.