Whistledown: Combining User-Level Privacy with Conversational Coherence in LLMs

Chelsea McMurray; Hayder Tirmazi

Whistledown: Combining User-Level Privacy with Conversational Coherence in LLMs

Chelsea McMurray, Hayder Tirmazi

TL;DR

Whistledown addresses the privacy risks of using cloud-hosted LLMs for sensitive conversations by introducing a best-effort privacy layer that operates on either the user’s device or a zero-trust enterprise gateway. It combines pseudonymization with $ε$-local differential privacy ($ε$-LDP) and a resource-aware transformation strategy, including a consistent per-session token mapping to maintain conversational coherence. The work details two deployment modes (Whistledown-Device and Whistledown-Gateway), core techniques (Bloom-filter filtering, GLiNER NER, on-device BERT-TINY embeddings, and gender-aware or embedding-based transformations), and a comprehensive evaluation showing realistic latency within interactive ranges. The approach provides direct PII protection, optional demographic privacy, and long-term privacy through per-session coherence plus DP guarantees, offering a practical privacy-preserving path for deployments of conversational AI in both personal and enterprise settings.

Abstract

Users increasingly rely on large language models (LLMs) for personal, emotionally charged, and socially sensitive conversations. However, prompts sent to cloud-hosted models can contain personally identifiable information (PII) that users do not want logged, retained, or leaked. We observe this to be especially acute when users discuss friends, coworkers, or adversaries, i.e., when they spill the tea. Enterprises face the same challenge when they want to use LLMs for internal communication and decision-making. In this whitepaper, we present Whistledown, a best-effort privacy layer that modifies prompts before they are sent to the LLM. Whistledown combines pseudonymization and $ε$-local differential privacy ($ε$-LDP) with transformation caching to provide best-effort privacy protection without sacrificing conversational utility. Whistledown is designed to have low compute and memory overhead, allowing it to be deployed directly on a client's device in the case of individual users. For enterprise users, Whistledown is deployed centrally within a zero-trust gateway that runs on an enterprise's trusted infrastructure. Whistledown requires no changes to the existing APIs of popular LLM providers.

Whistledown: Combining User-Level Privacy with Conversational Coherence in LLMs

TL;DR

-local differential privacy (

-LDP) and a resource-aware transformation strategy, including a consistent per-session token mapping to maintain conversational coherence. The work details two deployment modes (Whistledown-Device and Whistledown-Gateway), core techniques (Bloom-filter filtering, GLiNER NER, on-device BERT-TINY embeddings, and gender-aware or embedding-based transformations), and a comprehensive evaluation showing realistic latency within interactive ranges. The approach provides direct PII protection, optional demographic privacy, and long-term privacy through per-session coherence plus DP guarantees, offering a practical privacy-preserving path for deployments of conversational AI in both personal and enterprise settings.

Abstract

-local differential privacy (

-LDP) with transformation caching to provide best-effort privacy protection without sacrificing conversational utility. Whistledown is designed to have low compute and memory overhead, allowing it to be deployed directly on a client's device in the case of individual users. For enterprise users, Whistledown is deployed centrally within a zero-trust gateway that runs on an enterprise's trusted infrastructure. Whistledown requires no changes to the existing APIs of popular LLM providers.

Whistledown: Combining User-Level Privacy with Conversational Coherence in LLMs

TL;DR

Abstract

Whistledown: Combining User-Level Privacy with Conversational Coherence in LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)