Table of Contents
Fetching ...

ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

Hwan Chang, Yonghyun Jun, Hwanhee Lee

TL;DR

This work reveals a new vulnerability in LLM agents that rely on structured chat templates, showing that attackers can exploit role-token formatting to perform indirect prompt injection. By introducing ChatInject and a persuasion-enhanced Multi-turn variant, the authors demonstrate substantial gains in attack success rates across frontier models and benchmarks, including transfer to closed-source systems. The study shows that template-based payloads transfer with template similarity, and that mixture-based strategies (MoT) enable attacks even when the attacker lacks knowledge of the target backbone. Existing defenses largely fail against ChatInject, and simple template perturbations do not fully mitigate the threat, highlighting the need for template-aware, robust defense mechanisms. Overall, the work exposes critical security gaps in current agent designs and motivates the development of defenses that account for template structure and multi-turn contextual manipulation.”

Abstract

The growing deployment of large language model (LLM) based agents that interact with external environments has created new attack surfaces for adversarial manipulation. One major threat is indirect prompt injection, where attackers embed malicious instructions in external environment output, causing agents to interpret and execute them as if they were legitimate prompts. While previous research has focused primarily on plain-text injection attacks, we find a significant yet underexplored vulnerability: LLMs' dependence on structured chat templates and their susceptibility to contextual manipulation through persuasive multi-turn dialogues. To this end, we introduce ChatInject, an attack that formats malicious payloads to mimic native chat templates, thereby exploiting the model's inherent instruction-following tendencies. Building on this foundation, we develop a persuasion-driven Multi-turn variant that primes the agent across conversational turns to accept and execute otherwise suspicious actions. Through comprehensive experiments across frontier LLMs, we demonstrate three critical findings: (1) ChatInject achieves significantly higher average attack success rates than traditional prompt injection methods, improving from 5.18% to 32.05% on AgentDojo and from 15.13% to 45.90% on InjecAgent, with multi-turn dialogues showing particularly strong performance at average 52.33% success rate on InjecAgent, (2) chat-template-based payloads demonstrate strong transferability across models and remain effective even against closed-source LLMs, despite their unknown template structures, and (3) existing prompt-based defenses are largely ineffective against this attack approach, especially against Multi-turn variants. These findings highlight vulnerabilities in current agent systems.

ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

TL;DR

This work reveals a new vulnerability in LLM agents that rely on structured chat templates, showing that attackers can exploit role-token formatting to perform indirect prompt injection. By introducing ChatInject and a persuasion-enhanced Multi-turn variant, the authors demonstrate substantial gains in attack success rates across frontier models and benchmarks, including transfer to closed-source systems. The study shows that template-based payloads transfer with template similarity, and that mixture-based strategies (MoT) enable attacks even when the attacker lacks knowledge of the target backbone. Existing defenses largely fail against ChatInject, and simple template perturbations do not fully mitigate the threat, highlighting the need for template-aware, robust defense mechanisms. Overall, the work exposes critical security gaps in current agent designs and motivates the development of defenses that account for template structure and multi-turn contextual manipulation.”

Abstract

The growing deployment of large language model (LLM) based agents that interact with external environments has created new attack surfaces for adversarial manipulation. One major threat is indirect prompt injection, where attackers embed malicious instructions in external environment output, causing agents to interpret and execute them as if they were legitimate prompts. While previous research has focused primarily on plain-text injection attacks, we find a significant yet underexplored vulnerability: LLMs' dependence on structured chat templates and their susceptibility to contextual manipulation through persuasive multi-turn dialogues. To this end, we introduce ChatInject, an attack that formats malicious payloads to mimic native chat templates, thereby exploiting the model's inherent instruction-following tendencies. Building on this foundation, we develop a persuasion-driven Multi-turn variant that primes the agent across conversational turns to accept and execute otherwise suspicious actions. Through comprehensive experiments across frontier LLMs, we demonstrate three critical findings: (1) ChatInject achieves significantly higher average attack success rates than traditional prompt injection methods, improving from 5.18% to 32.05% on AgentDojo and from 15.13% to 45.90% on InjecAgent, with multi-turn dialogues showing particularly strong performance at average 52.33% success rate on InjecAgent, (2) chat-template-based payloads demonstrate strong transferability across models and remain effective even against closed-source LLMs, despite their unknown template structures, and (3) existing prompt-based defenses are largely ineffective against this attack approach, especially against Multi-turn variants. These findings highlight vulnerabilities in current agent systems.

Paper Structure

This paper contains 52 sections, 2 equations, 8 figures, 20 tables.

Figures (8)

  • Figure 1: A comparison of injection methods. In Case 1, the agent ignores a standard plain-text injection (Default InjecPrompt). In Case 2, the ChatInject attack uses forged chat template tokens to deceive the agent into executing the malicious command.
  • Figure 2: Four attack payload variants embedded in the tool response $R_{T_u}$, categorized by injection method—plain text (left) vs. forged chat templates with ChatInject (right)—and by content: a pure attacker instruction (top) or multi-turn conversation (bottom). $\oplus$ denotes line-wise concatenation.
  • Figure 3: Performance of cross-model ChatInject attacks. As template similarity increases, the ASR (left) rises, while the model's Utility (right) degrades. The shaded region represents the 95% confidence interval for each result, computed using the Wilson Interval.
  • Figure 4: Visualization of the mean and std. for Single vs. MoT settings; the dashed line marks ASR of Default InjecPrompt.
  • Figure 5: Comparison of ASR (top) and Utility (bottom) for Qwen-3 and Grok-3 across defense configurations, aggregated over all attack types. Baselines are the per-model scores without defense: Default InjecPrompt and Default Multi-turn. The shaded region represents the 95% confidence interval for each result, computed using the Wilson Interval.
  • ...and 3 more figures