Table of Contents
Fetching ...

"Ghost of the past": identifying and resolving privacy leakage from LLM's memory through proactive user interaction

Shuning Zhang, Lyumanshan Ye, Xin Yi, Jingyu Tang, Bo Shui, Haobin Xing, Pengfei Liu, Hewu Li

TL;DR

This study contributes to privacy-conscious LLM design, offering insights into privacy protection for Human-AI interactions, and proposes MemoAnalyzer, a system for identifying, visualizing, and managing private information within memories.

Abstract

Memories, encompassing past inputs in context window and retrieval-augmented generation (RAG), frequently surface during human-LLM interactions, yet users are often unaware of their presence and the associated privacy risks. To address this, we propose MemoAnalyzer, a system for identifying, visualizing, and managing private information within memories. A semi-structured interview (N=40) revealed that low privacy awareness was the primary challenge, while proactive privacy control emerged as the most common user need. MemoAnalyzer uses a prompt-based method to infer and identify sensitive information from aggregated past inputs, allowing users to easily modify sensitive content. Background color temperature and transparency are mapped to inference confidence and sensitivity, streamlining privacy adjustments. A 5-day evaluation (N=36) comparing MemoAnalyzer with the default GPT setting and a manual modification baseline showed MemoAnalyzer significantly improved privacy awareness and protection without compromising interaction speed. Our study contributes to privacy-conscious LLM design, offering insights into privacy protection for Human-AI interactions.

"Ghost of the past": identifying and resolving privacy leakage from LLM's memory through proactive user interaction

TL;DR

This study contributes to privacy-conscious LLM design, offering insights into privacy protection for Human-AI interactions, and proposes MemoAnalyzer, a system for identifying, visualizing, and managing private information within memories.

Abstract

Memories, encompassing past inputs in context window and retrieval-augmented generation (RAG), frequently surface during human-LLM interactions, yet users are often unaware of their presence and the associated privacy risks. To address this, we propose MemoAnalyzer, a system for identifying, visualizing, and managing private information within memories. A semi-structured interview (N=40) revealed that low privacy awareness was the primary challenge, while proactive privacy control emerged as the most common user need. MemoAnalyzer uses a prompt-based method to infer and identify sensitive information from aggregated past inputs, allowing users to easily modify sensitive content. Background color temperature and transparency are mapped to inference confidence and sensitivity, streamlining privacy adjustments. A 5-day evaluation (N=36) comparing MemoAnalyzer with the default GPT setting and a manual modification baseline showed MemoAnalyzer significantly improved privacy awareness and protection without compromising interaction speed. Our study contributes to privacy-conscious LLM design, offering insights into privacy protection for Human-AI interactions.

Paper Structure

This paper contains 46 sections, 1 equation, 8 figures, 1 table.

Figures (8)

  • Figure 1: Participants' familiarity and usage frequency towards AI (5: most familiar and frequent, 1: least familiar and frequent). The cross sign indicated the median and the square sign indicated the mean.
  • Figure 2: MemoAnalyzer's different functions. (A) The user click the notification attracting their curiosity. (B) The past inputs and memories used to infer the private information are expanded below. The specific phrases used for inference are highlighted to facilitate users' modification. (B1, B2) Users can edit or delete the memories while edit the past input. (C) User clicks the "save changes" button after modification to save the changes. (D) The inferred private information disappears or changes after user's modification.
  • Figure 3: The experiment platform of different techniques, (a) MemoAnalyzer, (b) GPT-4o, (c) Manual.
  • Figure 4: An overview of the study's process. There are interviews on Day-1 and Day-5 separately and questionnaires after each day's tasks.
  • Figure 5: The heatmap of inferred information for different privacy information categories (the final column is the sum). The number denoted the private information item counts inferred using LLMs, averaged across participants. The horizontal axis denotes the technology (MemoAnalyzer, GPT-4o, Manual) and the LLMs used for inference (GPT-4o, qwen, qwen-7b). The numbers for MemoAnalyzer is outlined with black boundaries.
  • ...and 3 more figures