Table of Contents
Fetching ...

Rescriber: Smaller-LLM-Powered User-Led Data Minimization for LLM-Based Chatbots

Jijie Zhou, Eryue Xu, Yaoyao Wu, Tianshi Li

TL;DR

Rescriber tackles the challenge of privacy in LLM-based chatbots by enabling user-led data minimization through an on-device, browser-based extension. It features two back-ends—a smaller on-device Llama3-8B model and a cloud-based GPT-4o—providing real-time PII detection, redaction, and abstraction with write-back to preserve utility. In a mixed-methods study with 12 participants, Rescriber reduced unnecessary disclosure and improved perceived privacy protection, with detection completeness and consistent sanitization identified as key trust factors. The work demonstrates the feasibility of smaller-LLM-powered, user-facing privacy controls as a practical, trust-enhancing approach to privacy in AI-assisted conversations.

Abstract

The proliferation of LLM-based conversational agents has resulted in excessive disclosure of identifiable or sensitive information. However, existing technologies fail to offer perceptible control or account for users' personal preferences about privacy-utility tradeoffs due to the lack of user involvement. To bridge this gap, we designed, built, and evaluated Rescriber, a browser extension that supports user-led data minimization in LLM-based conversational agents by helping users detect and sanitize personal information in their prompts. Our studies (N=12) showed that Rescriber helped users reduce unnecessary disclosure and addressed their privacy concerns. Users' subjective perceptions of the system powered by Llama3-8B were on par with that by GPT-4o. The comprehensiveness and consistency of the detection and sanitization emerge as essential factors that affect users' trust and perceived protection. Our findings confirm the viability of smaller-LLM-powered, user-facing, on-device privacy controls, presenting a promising approach to address the privacy and trust challenges of AI.

Rescriber: Smaller-LLM-Powered User-Led Data Minimization for LLM-Based Chatbots

TL;DR

Rescriber tackles the challenge of privacy in LLM-based chatbots by enabling user-led data minimization through an on-device, browser-based extension. It features two back-ends—a smaller on-device Llama3-8B model and a cloud-based GPT-4o—providing real-time PII detection, redaction, and abstraction with write-back to preserve utility. In a mixed-methods study with 12 participants, Rescriber reduced unnecessary disclosure and improved perceived privacy protection, with detection completeness and consistent sanitization identified as key trust factors. The work demonstrates the feasibility of smaller-LLM-powered, user-facing privacy controls as a practical, trust-enhancing approach to privacy in AI-assisted conversations.

Abstract

The proliferation of LLM-based conversational agents has resulted in excessive disclosure of identifiable or sensitive information. However, existing technologies fail to offer perceptible control or account for users' personal preferences about privacy-utility tradeoffs due to the lack of user involvement. To bridge this gap, we designed, built, and evaluated Rescriber, a browser extension that supports user-led data minimization in LLM-based conversational agents by helping users detect and sanitize personal information in their prompts. Our studies (N=12) showed that Rescriber helped users reduce unnecessary disclosure and addressed their privacy concerns. Users' subjective perceptions of the system powered by Llama3-8B were on par with that by GPT-4o. The comprehensiveness and consistency of the detection and sanitization emerge as essential factors that affect users' trust and perceived protection. Our findings confirm the viability of smaller-LLM-powered, user-facing, on-device privacy controls, presenting a promising approach to address the privacy and trust challenges of AI.

Paper Structure

This paper contains 92 sections, 2 figures, 17 tables.

Figures (2)

  • Figure 1: A snapshot of Rescriber user experience when using ChatGPT for data analysis. Rescriber displays a tooltip that highlights the detected personal information in the user's message (A), and offers a control panel (B) where users can either replace the information with a placeholder (B1), abstract the information to a more general version (B2), or revert the actions (B3). All the changes would be immediately effective on the message that would be sent out in the input box (C). In this example, the user replaced the detected names (e.g., James Williams) to corresponding placeholders (e.g., [NAME4]). For the sent messages (D1) and ChatGPT's responses (D2), Rescriber replaces the placeholders with the original PIIs to increase readability and facilitate copying. When the user mouses over the highlighted PIIs, the placeholders used in the actual messages sent and received are revealed (D).
  • Figure 2: Main stages and features of Rescriber: 1) User types the message in ChatGPT's prompt entry box; 2) Rescriber automatically detects and highlights the sensitive information; 3) user can redact their message based on Rescriber's suggestion. Once 4) GPT generates a response based on the sanitized message, and if GPT uses the part that user chooses to redact, 5) Rescriber will help replace the part back to the original information for better utility.