PPMI: Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases
Yubeen Bae, Minchan Kim, Jaejin Lee, Sangbum Kim, Jaehyung Kim, Yejin Choi, Niloofar Mireshghallah
TL;DR
This work tackles the privacy risk of sending private user data to cloud-based LLMs by proposing a four-stage, privacy-preserving framework that delegates non-private reasoning to a powerful external LLM while keeping sensitive data on a trusted local device. It introduces Socratic Chain-of-Thought Reasoning to decompose complex tasks into sub-queries, which are answered using a homomorphically encrypted vector database that supports secure, dynamic retrieval with sub-second latency. Key contributions include new inner-product optimizations for encrypted search (batching, caching, butterfly decomposition, and leading-term removal) and a practical API design enabling constant-time updates, with security guarantees based on CKKS and AES-256. Experiments on LoCoMo and MediQ show the hybrid approach significantly improves local baselines and approaches oracle baselines, while encrypted search maintains high accuracy and scales to 1M entries with minimal overhead, highlighting a viable path to private yet capable AI assistants. The results demonstrate that task decomposition across untrusted high-capacity LLMs and trusted light-weight local models can provide strong privacy without sacrificing performance, enabling real-world deployment of private personal AI assistants.
Abstract
Large language models (LLMs) are increasingly used as personal agents, accessing sensitive user data such as calendars, emails, and medical records. Users currently face a trade-off: They can send private records, many of which are stored in remote databases, to powerful but untrusted LLM providers, increasing their exposure risk. Alternatively, they can run less powerful models locally on trusted devices. We bridge this gap. Our Socratic Chain-of-Thought Reasoning first sends a generic, non-private user query to a powerful, untrusted LLM, which generates a Chain-of-Thought (CoT) prompt and detailed sub-queries without accessing user data. Next, we embed these sub-queries and perform encrypted sub-second semantic search using our Homomorphically Encrypted Vector Database across one million entries of a single user's private data. This represents a realistic scale of personal documents, emails, and records accumulated over years of digital activity. Finally, we feed the CoT prompt and the decrypted records to a local language model and generate the final response. On the LoCoMo long-context QA benchmark, our hybrid framework, combining GPT-4o with a local Llama-3.2-1B model, outperforms using GPT-4o alone by up to 7.1 percentage points. This demonstrates a first step toward systems where tasks are decomposed and split between untrusted strong LLMs and weak local ones, preserving user privacy.
