Table of Contents
Fetching ...

Unveiling Privacy Risks in LLM Agent Memory

Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, Pengfei He

TL;DR

The paper unveils privacy risks associated with memory modules in LLM agents by introducing MEXTRA, a black-box memory extraction attack powered by carefully designed attacking prompts and an automated diversity-based prompt generator. Through experiments on EHRAgent and RAP, it demonstrates that memory leakage is practical and sensitive to memory configuration, prompting strategies, and attacker knowledge. It shows that adversaries can extract substantial private memory content, underscoring the need for safeguards such as input/output controls and memory sanitization. The work also discusses limitations, including a single-agent focus, and points to future work on inter-agent memory sharing and session-level isolation to mitigate leakage.

Abstract

Large Language Model (LLM) agents have become increasingly prevalent across various real-world applications. They enhance decision-making by storing private user-agent interactions in the memory module for demonstrations, introducing new privacy risks for LLM agents. In this work, we systematically investigate the vulnerability of LLM agents to our proposed Memory EXTRaction Attack (MEXTRA) under a black-box setting. To extract private information from memory, we propose an effective attacking prompt design and an automated prompt generation method based on different levels of knowledge about the LLM agent. Experiments on two representative agents demonstrate the effectiveness of MEXTRA. Moreover, we explore key factors influencing memory leakage from both the agent designer's and the attacker's perspectives. Our findings highlight the urgent need for effective memory safeguards in LLM agent design and deployment.

Unveiling Privacy Risks in LLM Agent Memory

TL;DR

The paper unveils privacy risks associated with memory modules in LLM agents by introducing MEXTRA, a black-box memory extraction attack powered by carefully designed attacking prompts and an automated diversity-based prompt generator. Through experiments on EHRAgent and RAP, it demonstrates that memory leakage is practical and sensitive to memory configuration, prompting strategies, and attacker knowledge. It shows that adversaries can extract substantial private memory content, underscoring the need for safeguards such as input/output controls and memory sanitization. The work also discusses limitations, including a single-agent focus, and points to future work on inter-agent memory sharing and session-level isolation to mitigate leakage.

Abstract

Large Language Model (LLM) agents have become increasingly prevalent across various real-world applications. They enhance decision-making by storing private user-agent interactions in the memory module for demonstrations, introducing new privacy risks for LLM agents. In this work, we systematically investigate the vulnerability of LLM agents to our proposed Memory EXTRaction Attack (MEXTRA) under a black-box setting. To extract private information from memory, we propose an effective attacking prompt design and an automated prompt generation method based on different levels of knowledge about the LLM agent. Experiments on two representative agents demonstrate the effectiveness of MEXTRA. Moreover, we explore key factors influencing memory leakage from both the agent designer's and the attacker's perspectives. Our findings highlight the urgent need for effective memory safeguards in LLM agent design and deployment.

Paper Structure

This paper contains 51 sections, 7 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: The workflow of a web agent with memory module for a normal user query (left) and an attacking prompt (right). Only the first-step solution is shown for the normal user query, omitting subsequent actions like "click [Buy Now]" since the focus is on comparing it with the extraction attack.
  • Figure 2: The extracted efficiency (EE) across different memory sizes $m$ ranging from 50 to 500 on two agents.
  • Figure 3: The extracted number (EN) and retrieved number (RN) across different retrieval depths $k$ ranging from 1 to 5 on two agents.
  • Figure 4: The impact of the number of attacking prompts $n$ and the prompt generation instructions ${\mathcal{I}}^{\text{advan}}$/${\mathcal{I}}^{\text{basic}}$ on extracted number (EN) and retrieved number (RN). The memory size is 200.
  • Figure 5: The overlap among retrieved queries on two agents. The results are derived based on the setting detailed in Section §\ref{['subsec:agent_setup']}. The retrieved numbers are 55 and 27 for EHRAgent and RAP respectively.