Automating Agent Hijacking via Structural Template Injection

Xinhao Deng; Jiaqing Wu; Miao Chen; Yue Xiao; Ke Xu; Qi Li

Automating Agent Hijacking via Structural Template Injection

Xinhao Deng, Jiaqing Wu, Miao Chen, Yue Xiao, Ke Xu, Qi Li

TL;DR

Phantom is proposed, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents and significantly outperforms existing baselines in both Attack Success Rate (ASR) and query efficiency.

Abstract

Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success rates and limited transferability to closed-source commercial models. In this paper, we propose Phantom, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents. Our key insight is that agents rely on specific chat template tokens to separate system, user, assistant, and tool instructions. By injecting optimized structured templates into the retrieved context, we induce role confusion and cause the agent to misinterpret the injected content as legitimate user instructions or prior tool outputs. To enhance attack transferability against black-box agents, Phantom introduces a novel attack template search framework. We first perform multi-level template augmentation to increase structural diversity and then train a Template Autoencoder (TAE) to embed discrete templates into a continuous, searchable latent space. Subsequently, we apply Bayesian optimization to efficiently identify optimal adversarial vectors that are decoded into high-potency structured templates. Extensive experiments on Qwen, GPT, and Gemini demonstrate that our framework significantly outperforms existing baselines in both Attack Success Rate (ASR) and query efficiency. Moreover, we identified over 70 vulnerabilities in real-world commercial products that have been confirmed by vendors, underscoring the practical severity of structured template-based hijacking and providing an empirical foundation for securing next-generation agentic systems.

Automating Agent Hijacking via Structural Template Injection

TL;DR

Abstract

Paper Structure (28 sections, 7 equations, 6 figures, 12 tables, 3 algorithms)

This paper contains 28 sections, 7 equations, 6 figures, 12 tables, 3 algorithms.

Introduction
Background
LLM Agents and Chat Templates
Indirect Prompt Injection
Threat Model
Design of Phantom
Key Observation
Overview of Phantom
Design Details
Multi-level Template Augmentation
Latent Space Mapping
Automated Template Search
Evaluation
Experiment Setup
Evaluation on SOTA Agents
...and 13 more sections

Figures (6)

Figure 1: Threat model of Phantom. The adversary injects the structural template into external data sources accessed by the LLM agent, inducing role confusion to hijack the agent's execution flow.
Figure 2: The distribution of the Agent's attention over input tokens. Compared to direct injection, structured template injection effectively shifts the Agent's attention from authentic content to the injected prompt.
Figure 3: Overview of Phantom.
Figure 4: Attack success rates of Phantom across different Agents and scenarios in AgentDojo.
Figure 5: ASR convergence over optimization iterations.
...and 1 more figures

Automating Agent Hijacking via Structural Template Injection

TL;DR

Abstract

Automating Agent Hijacking via Structural Template Injection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)