Unveiling Privacy Risks in LLM Agent Memory
Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, Pengfei He
TL;DR
The paper unveils privacy risks associated with memory modules in LLM agents by introducing MEXTRA, a black-box memory extraction attack powered by carefully designed attacking prompts and an automated diversity-based prompt generator. Through experiments on EHRAgent and RAP, it demonstrates that memory leakage is practical and sensitive to memory configuration, prompting strategies, and attacker knowledge. It shows that adversaries can extract substantial private memory content, underscoring the need for safeguards such as input/output controls and memory sanitization. The work also discusses limitations, including a single-agent focus, and points to future work on inter-agent memory sharing and session-level isolation to mitigate leakage.
Abstract
Large Language Model (LLM) agents have become increasingly prevalent across various real-world applications. They enhance decision-making by storing private user-agent interactions in the memory module for demonstrations, introducing new privacy risks for LLM agents. In this work, we systematically investigate the vulnerability of LLM agents to our proposed Memory EXTRaction Attack (MEXTRA) under a black-box setting. To extract private information from memory, we propose an effective attacking prompt design and an automated prompt generation method based on different levels of knowledge about the LLM agent. Experiments on two representative agents demonstrate the effectiveness of MEXTRA. Moreover, we explore key factors influencing memory leakage from both the agent designer's and the attacker's perspectives. Our findings highlight the urgent need for effective memory safeguards in LLM agent design and deployment.
