Table of Contents
Fetching ...

Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models

Chun Jie Chong, Chenxi Hou, Zhihao Yao, Seyed Mohammadjavad Seyed Talebi

TL;DR

<3-5 sentence high-level summary> Casper tackles privacy risks in web-based LLM use by introducing Casper, a client-side browser extension that sanitizes prompts before they reach LLM services. It employs a three-layer pipeline—rule-based filtering, named-entity recognition, and local LLM-based topic identification—so PII is redacted with placeholders and privacy-sensitive topics are flagged for user review, all without modifying the LLM services. Evaluation on 4000 synthetic prompts shows high PII detection accuracy (≈98.5%) and strong topic-detection performance with acceptable overhead (<~15% processing cost and modest memory usage). The work demonstrates a practical, compatible privacy solution that preserves usability while giving users control, and it lays a foundation for further client-side privacy enhancements in LLM-enabled workflows.

Abstract

Web-based Large Language Model (LLM) services have been widely adopted and have become an integral part of our Internet experience. Third-party plugins enhance the functionalities of LLM by enabling access to real-world data and services. However, the privacy consequences associated with these services and their third-party plugins are not well understood. Sensitive prompt data are stored, processed, and shared by cloud-based LLM providers and third-party plugins. In this paper, we propose Casper, a prompt sanitization technique that aims to protect user privacy by detecting and removing sensitive information from user inputs before sending them to LLM services. Casper runs entirely on the user's device as a browser extension and does not require any changes to the online LLM services. At the core of Casper is a three-layered sanitization mechanism consisting of a rule-based filter, a Machine Learning (ML)-based named entity recognizer, and a browser-based local LLM topic identifier. We evaluate Casper on a dataset of 4000 synthesized prompts and show that it can effectively filter out Personal Identifiable Information (PII) and privacy-sensitive topics with high accuracy, at 98.5% and 89.9%, respectively.

Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models

TL;DR

<3-5 sentence high-level summary> Casper tackles privacy risks in web-based LLM use by introducing Casper, a client-side browser extension that sanitizes prompts before they reach LLM services. It employs a three-layer pipeline—rule-based filtering, named-entity recognition, and local LLM-based topic identification—so PII is redacted with placeholders and privacy-sensitive topics are flagged for user review, all without modifying the LLM services. Evaluation on 4000 synthetic prompts shows high PII detection accuracy (≈98.5%) and strong topic-detection performance with acceptable overhead (<~15% processing cost and modest memory usage). The work demonstrates a practical, compatible privacy solution that preserves usability while giving users control, and it lays a foundation for further client-side privacy enhancements in LLM-enabled workflows.

Abstract

Web-based Large Language Model (LLM) services have been widely adopted and have become an integral part of our Internet experience. Third-party plugins enhance the functionalities of LLM by enabling access to real-world data and services. However, the privacy consequences associated with these services and their third-party plugins are not well understood. Sensitive prompt data are stored, processed, and shared by cloud-based LLM providers and third-party plugins. In this paper, we propose Casper, a prompt sanitization technique that aims to protect user privacy by detecting and removing sensitive information from user inputs before sending them to LLM services. Casper runs entirely on the user's device as a browser extension and does not require any changes to the online LLM services. At the core of Casper is a three-layered sanitization mechanism consisting of a rule-based filter, a Machine Learning (ML)-based named entity recognizer, and a browser-based local LLM topic identifier. We evaluate Casper on a dataset of 4000 synthesized prompts and show that it can effectively filter out Personal Identifiable Information (PII) and privacy-sensitive topics with high accuracy, at 98.5% and 89.9%, respectively.
Paper Structure (49 sections, 6 figures, 6 tables, 1 algorithm)

This paper contains 49 sections, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of online LLM services and Casper architecture. Blue represents trusted components, green represents minimal privacy risk, yellow represents increased privacy risk, and red represents substantial privacy risk.
  • Figure 2: Three-layered prompt sanitization in Casper.
  • Figure 3: Venn diagram of the coverage of three stages of prompt sanitization in Casper.
  • Figure 4: Popup message box that indicates private and sensitive information found by Casper.
  • Figure 5: Time to process a prompt in each stage of Casper.
  • ...and 1 more figures