Table of Contents
Fetching ...

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She

TL;DR

This work formalizes a query-agnostic indirect prompt injection (IPI) threat on coding agents in IDEs, showing that leaking an agent's internal prompt enables a constrained white-box optimization to craft malicious tool descriptions. The authors introduce QueryIPI, an automated method that iteratively mutates tool descriptions using a Mutation LLM and evaluates them with a Judge LLM to maximize a cumulative attack score across training queries. Experiments on five simulated agents and real-world transfers demonstrate high attack success rates, robustness to partial prompt knowledge, cross-LLM transferability, and stealth against standard detection metrics. The findings reveal a practical security risk posed by exposed internal prompts and underscore the need for defenses that harden tool-description channels and guardrails in LLM-based coding agents.

Abstract

Modern coding agents integrated into IDEs combine powerful tools and system-level actions, exposing a high-stakes attack surface. Existing Indirect Prompt Injection (IPI) studies focus mainly on query-specific behaviors, leading to unstable attacks with lower success rates. We identify a more severe, query-agnostic threat that remains effective across diverse user inputs. This challenge can be overcome by exploiting a common vulnerability: leakage of the agent's internal prompt, which turns the attack into a constrained white-box optimization problem. We present QueryIPI, the first query-agnostic IPI method for coding agents. QueryIPI refines malicious tool descriptions through an iterative, prompt-based process informed by the leaked internal prompt. Experiments on five simulated agents show that QueryIPI achieves up to 87 percent success, outperforming baselines, and the generated malicious descriptions also transfer to real-world systems, highlighting a practical security risk to modern LLM-based coding agents.

QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents

TL;DR

This work formalizes a query-agnostic indirect prompt injection (IPI) threat on coding agents in IDEs, showing that leaking an agent's internal prompt enables a constrained white-box optimization to craft malicious tool descriptions. The authors introduce QueryIPI, an automated method that iteratively mutates tool descriptions using a Mutation LLM and evaluates them with a Judge LLM to maximize a cumulative attack score across training queries. Experiments on five simulated agents and real-world transfers demonstrate high attack success rates, robustness to partial prompt knowledge, cross-LLM transferability, and stealth against standard detection metrics. The findings reveal a practical security risk posed by exposed internal prompts and underscore the need for defenses that harden tool-description channels and guardrails in LLM-based coding agents.

Abstract

Modern coding agents integrated into IDEs combine powerful tools and system-level actions, exposing a high-stakes attack surface. Existing Indirect Prompt Injection (IPI) studies focus mainly on query-specific behaviors, leading to unstable attacks with lower success rates. We identify a more severe, query-agnostic threat that remains effective across diverse user inputs. This challenge can be overcome by exploiting a common vulnerability: leakage of the agent's internal prompt, which turns the attack into a constrained white-box optimization problem. We present QueryIPI, the first query-agnostic IPI method for coding agents. QueryIPI refines malicious tool descriptions through an iterative, prompt-based process informed by the leaked internal prompt. Experiments on five simulated agents show that QueryIPI achieves up to 87 percent success, outperforming baselines, and the generated malicious descriptions also transfer to real-world systems, highlighting a practical security risk to modern LLM-based coding agents.

Paper Structure

This paper contains 21 sections, 3 equations, 1 figure, 5 tables, 1 algorithm.

Figures (1)

  • Figure 1: Comparison of Query-Specific and Query-Agnostic Indirect Prompt Injection. The left panel depicts a classic IPI attack, consistent with the threat models in recent benchmarks such as AgentDojo debenedetti2024agentdojo and Injecagent zhan2024injecagent. The attack is triggered only when a specific user query (①) causes the agent to invoke a compromised tool (②), which then injects malicious content (③). The probabilistic nature of this process is represented by dashed lines. In contrast, the right panel illustrates our query-agnostic attack, where the agent is compromised regardless of the user's input (①), leading to a deterministic malicious outcome (②), as shown by the solid arrow. Malicious actions and content are highlighted in red.