Table of Contents
Fetching ...

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

Ruoyao Wen, Hao Li, Chaowei Xiao, Ning Zhang

TL;DR

AgentSys tackles indirect prompt injection by enforcing explicit memory management in LLM agents through a hierarchical architecture where worker agents isolate untrusted data and the main agent only ingests schema-bound, JSON-parsable returns. The design combines context isolation, a pre-declared intent interface, a validator for recursive tool use, and a bounded sanitizer-restart recovery loop, collectively reducing attack persistence while preserving task flexibility. Empirical results on AgentDojo and ASB show state-of-the-art security (ASR near 0% in long workflows) with competitive benign utility across multiple foundation models and under adaptive attacks. The work demonstrates a practical path toward secure, dynamic LLM agent systems and provides a reusable framework for memory-safe agent orchestration with realistic overhead constraints.

Abstract

Indirect prompt injection threatens LLM agents by embedding malicious instructions in external content, enabling unauthorized actions and data theft. LLM agents maintain working memory through their context window, which stores interaction history for decision-making. Conventional agents indiscriminately accumulate all tool outputs and reasoning traces in this memory, creating two critical vulnerabilities: (1) injected instructions persist throughout the workflow, granting attackers multiple opportunities to manipulate behavior, and (2) verbose, non-essential content degrades decision-making capabilities. Existing defenses treat bloated memory as given and focus on remaining resilient, rather than reducing unnecessary accumulation to prevent the attack. We present AgentSys, a framework that defends against indirect prompt injection through explicit memory management. Inspired by process memory isolation in operating systems, AgentSys organizes agents hierarchically: a main agent spawns worker agents for tool calls, each running in an isolated context and able to spawn nested workers for subtasks. External data and subtask traces never enter the main agent's memory; only schema-validated return values can cross boundaries through deterministic JSON parsing. Ablations show isolation alone cuts attack success to 2.19%, and adding a validator/sanitizer further improves defense with event-triggered checks whose overhead scales with operations rather than context length. On AgentDojo and ASB, AgentSys achieves 0.78% and 4.25% attack success while slightly improving benign utility over undefended baselines. It remains robust to adaptive attackers and across multiple foundation models, showing that explicit memory management enables secure, dynamic LLM agent architectures. Our code is available at: https://github.com/ruoyaow/agentsys-memory.

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

TL;DR

AgentSys tackles indirect prompt injection by enforcing explicit memory management in LLM agents through a hierarchical architecture where worker agents isolate untrusted data and the main agent only ingests schema-bound, JSON-parsable returns. The design combines context isolation, a pre-declared intent interface, a validator for recursive tool use, and a bounded sanitizer-restart recovery loop, collectively reducing attack persistence while preserving task flexibility. Empirical results on AgentDojo and ASB show state-of-the-art security (ASR near 0% in long workflows) with competitive benign utility across multiple foundation models and under adaptive attacks. The work demonstrates a practical path toward secure, dynamic LLM agent systems and provides a reusable framework for memory-safe agent orchestration with realistic overhead constraints.

Abstract

Indirect prompt injection threatens LLM agents by embedding malicious instructions in external content, enabling unauthorized actions and data theft. LLM agents maintain working memory through their context window, which stores interaction history for decision-making. Conventional agents indiscriminately accumulate all tool outputs and reasoning traces in this memory, creating two critical vulnerabilities: (1) injected instructions persist throughout the workflow, granting attackers multiple opportunities to manipulate behavior, and (2) verbose, non-essential content degrades decision-making capabilities. Existing defenses treat bloated memory as given and focus on remaining resilient, rather than reducing unnecessary accumulation to prevent the attack. We present AgentSys, a framework that defends against indirect prompt injection through explicit memory management. Inspired by process memory isolation in operating systems, AgentSys organizes agents hierarchically: a main agent spawns worker agents for tool calls, each running in an isolated context and able to spawn nested workers for subtasks. External data and subtask traces never enter the main agent's memory; only schema-validated return values can cross boundaries through deterministic JSON parsing. Ablations show isolation alone cuts attack success to 2.19%, and adding a validator/sanitizer further improves defense with event-triggered checks whose overhead scales with operations rather than context length. On AgentDojo and ASB, AgentSys achieves 0.78% and 4.25% attack success while slightly improving benign utility over undefended baselines. It remains robust to adaptive attackers and across multiple foundation models, showing that explicit memory management enables secure, dynamic LLM agent architectures. Our code is available at: https://github.com/ruoyaow/agentsys-memory.
Paper Structure (26 sections, 11 equations, 4 figures, 7 tables)

This paper contains 26 sections, 11 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: AgentSys Overview. At step 1, the worker agent #1 is spawned to process the tool response, guided by the intent declared by the main agent. Worker agent #1 can recursively call tools and spawn worker agent #2, mediated by the alignment validator. After receiving the return value from worker agent #1 as a tool observation, the main agent continues to reason for step 2 within the global context, discarding the local context.
  • Figure 2: Main experimental results on ASB using GPT-4o-mini.
  • Figure 3: Trade-off among utility, security, and computational overhead on AgentDojo. (a) Security-Utility Trade-off: AgentSys achieves the best balance with highest utility and security. (b) Quality-Cost Trade-off: AgentSys attains the highest defense quality with comparable token cost.
  • Figure 4: Performance analysis by trajectory length on AgentDojo.