Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

Qianlong Lan; Anuj Kaul; Shaun Jones; Stephanie Westrum

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

Qianlong Lan, Anuj Kaul, Shaun Jones, Stephanie Westrum

TL;DR

It is demonstrated that a malicious web page can induce an agent to issue outbound requests that exfiltrate sensitive runtime context, even when the final response shown to the user appears harmless, suggesting that network egress should be treated as a first-class security outcome in agentic LLM systems.

Abstract

Agentic large language model systems increasingly automate tasks by retrieving URLs and calling external tools. We show that this workflow gives rise to implicit prompt injection: adversarial instructions embedded in automatically generated URL previews, including titles, metadata, and snippets, can introduce a system-level risk that we refer to as silent egress. Using a fully local and reproducible testbed, we demonstrate that a malicious web page can induce an agent to issue outbound requests that exfiltrate sensitive runtime context, even when the final response shown to the user appears harmless. In 480 experimental runs with a qwen2.5:7b-based agent, the attack succeeds with high probability (P (egress) =0.89), and 95% of successful attacks are not detected by output-based safety checks. We also introduce sharded exfiltration, where sensitive information is split across multiple requests to avoid detection. This strategy reduces single-request leakage metrics by 73% (Leak@1) and bypasses simple data loss prevention mechanisms. Our ablation results indicate that defenses applied at the prompt layer offer limited protection, while controls at the system and network layers, such as domain allowlisting and redirect-chain analysis, are considerably more effective. These findings suggest that network egress should be treated as a first-class security outcome in agentic LLM systems. We outline architectural directions, including provenance tracking and capability isolation, that go beyond prompt-level hardening.

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

TL;DR

Abstract

Paper Structure (45 sections, 4 equations, 2 figures, 7 tables)

This paper contains 45 sections, 4 equations, 2 figures, 7 tables.

Introduction
Implicit Prompt Injection as a Subclass of Indirect Injection
A Classical Security Perspective
Attack Chain Overview
Realistic Threat Vectors
Contributions
Background and Motivation
Agentic LLM Systems and the ReAct Loop
Automatic URL Previewing and Context Flattening
The Gap in Output-Centric Safety Evaluations
Threat Model
System Model
Adversary Model
Security Goal
Attack Chain Formalization
...and 30 more sections

Figures (2)

Figure 1: Silent Egress attack chain with visibility zones. Green indicates user-visible interactions; red indicates invisible operations. The attack's danger lies in its dual invisibility: steps 2--5 occur without user awareness, while the final response (step 6) appears completely benign.
Figure 2: Trade-off between stealth (sharding) and reliability. Sharded exfiltration shows reduced success across all surfaces, with the largest drop on Meta ($-37\%$). Body and Title/Anchor maintain relatively high success rates even with complex multi-step instructions.

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

TL;DR

Abstract

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

Authors

TL;DR

Abstract

Table of Contents

Figures (2)