Table of Contents
Fetching ...

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, XiaoFeng Wang

TL;DR

A-MemGuard introduces a proactive defense framework for LLM-based agent memory that does not modify core architectures. It combines consensus-based validation across parallel reasoning paths with a dual-memory system that distills detected errors into lessons to prevent recurrence. The approach substantially lowers memory poisoning attack success rates in direct and indirect scenarios while preserving high benign-task accuracy, and it scales to multi-agent systems. By shifting from static filtering to experience-driven adaptation, the work offers a practical, generalizable security enhancement for memory-augmented agents.

Abstract

Large Language Model (LLM) agents use memory to learn from past interactions, enabling autonomous planning and decision-making in complex environments. However, this reliance on memory introduces a critical security risk: an adversary can inject seemingly harmless records into an agent's memory to manipulate its future behavior. This vulnerability is characterized by two core aspects: First, the malicious effect of injected records is only activated within a specific context, making them hard to detect when individual memory entries are audited in isolation. Second, once triggered, the manipulation can initiate a self-reinforcing error cycle: the corrupted outcome is stored as precedent, which not only amplifies the initial error but also progressively lowers the threshold for similar attacks in the future. To address these challenges, we introduce A-MemGuard (Agent-Memory Guard), the first proactive defense framework for LLM agent memory. The core idea of our work is the insight that memory itself must become both self-checking and self-correcting. Without modifying the agent's core architecture, A-MemGuard combines two mechanisms: (1) consensus-based validation, which detects anomalies by comparing reasoning paths derived from multiple related memories and (2) a dual-memory structure, where detected failures are distilled into ``lessons'' stored separately and consulted before future actions, breaking error cycles and enabling adaptation. Comprehensive evaluations on multiple benchmarks show that A-MemGuard effectively cuts attack success rates by over 95% while incurring a minimal utility cost. This work shifts LLM memory security from static filtering to a proactive, experience-driven model where defenses strengthen over time. Our code is available in https://github.com/TangciuYueng/AMemGuard

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

TL;DR

A-MemGuard introduces a proactive defense framework for LLM-based agent memory that does not modify core architectures. It combines consensus-based validation across parallel reasoning paths with a dual-memory system that distills detected errors into lessons to prevent recurrence. The approach substantially lowers memory poisoning attack success rates in direct and indirect scenarios while preserving high benign-task accuracy, and it scales to multi-agent systems. By shifting from static filtering to experience-driven adaptation, the work offers a practical, generalizable security enhancement for memory-augmented agents.

Abstract

Large Language Model (LLM) agents use memory to learn from past interactions, enabling autonomous planning and decision-making in complex environments. However, this reliance on memory introduces a critical security risk: an adversary can inject seemingly harmless records into an agent's memory to manipulate its future behavior. This vulnerability is characterized by two core aspects: First, the malicious effect of injected records is only activated within a specific context, making them hard to detect when individual memory entries are audited in isolation. Second, once triggered, the manipulation can initiate a self-reinforcing error cycle: the corrupted outcome is stored as precedent, which not only amplifies the initial error but also progressively lowers the threshold for similar attacks in the future. To address these challenges, we introduce A-MemGuard (Agent-Memory Guard), the first proactive defense framework for LLM agent memory. The core idea of our work is the insight that memory itself must become both self-checking and self-correcting. Without modifying the agent's core architecture, A-MemGuard combines two mechanisms: (1) consensus-based validation, which detects anomalies by comparing reasoning paths derived from multiple related memories and (2) a dual-memory structure, where detected failures are distilled into ``lessons'' stored separately and consulted before future actions, breaking error cycles and enabling adaptation. Comprehensive evaluations on multiple benchmarks show that A-MemGuard effectively cuts attack success rates by over 95% while incurring a minimal utility cost. This work shifts LLM memory security from static filtering to a proactive, experience-driven model where defenses strengthen over time. Our code is available in https://github.com/TangciuYueng/AMemGuard

Paper Structure

This paper contains 59 sections, 8 equations, 16 figures, 9 tables.

Figures (16)

  • Figure 1: High-level Overview of A-MemGuard.
  • Figure 2: Architectural overview of A-MemGuard. Upon receiving a query, the agent retrieves multiple memories to form parallel reasoning paths. The consensus validation module (Sec. \ref{['ssec:design1']}) detects anomalies by identifying deviations from the group consensus. Any detected flaws are stored in the dual-memory structure (Sec. \ref{['ssec:design2']}), i.e., lesson memory, which guides the agent to avoid repeating past errors before executing a final action.
  • Figure 2: Summary of average defensive performance against the indirect memory injection attack on MMLU wang2024mmlu. The metric is Attack Success Rate (ASR), where lower is better ($\downarrow$). Our method consistently achieves the best average performance. Details are shown in Table \ref{['tab:minja_results_detailed_updated']} in the appendix.
  • Figure 3: Injection Success Rate (ISR) for undefended agents across interaction rounds. The steady increase illustrates the self-reinforcing error cycle.
  • Figure 4: Knowledge graph analysis of reasoning paths. The bar charts show the distribution of relations.
  • ...and 11 more figures