Confidential Prompting: Privacy-preserving LLM Inference on Cloud
Caihua Li, In Gim, Lin Zhong
TL;DR
This work addresses the privacy risks of cloud-hosted LLM inference by protecting user prompts from untrusted cloud and LLM providers. It introduces Petridish, a system that runs the LLM inside a confidential VM and employs Secure Partitioned Decoding to separate per-user input processing from batched decoding, preserving model confidentiality and output fidelity while enabling auditable protection. The authors formalize a lossless attention partitioning approach, implement a prototype on Nvidia GPU-accelerated CC hardware, and show that SPD delivers scalable, efficient performance with reduced latency compared to per-user isolated deployments. Overall, Petridish demonstrates a practical path toward privacy-preserving, auditable, cloud-based LLM services suitable for handling sensitive data such as clinical or financial records, without sacrificing utility.
Abstract
This paper introduces a vision of confidential prompting: securing user prompts from an untrusted, cloud-hosted large language model (LLM) while preserving model confidentiality, output invariance, and compute efficiency. As a first step toward this vision, we present Petridish, a system built on top of confidential computing and its core contribution, a novel technology called Secure Partitioned Decoding (SPD). Petridish runs the LLM service inside a confidential virtual machine (CVM), which protects the secrets, i.e., the LLM parameters and user prompts, from adversaries outside the CVM. Importantly, it splits the LLM service for a user into two processes, using SPD: a per-user process performs prefill with the user prompts and computes attention scores during decoding; a service process, shared by all users, batches the attention scores from per-user processes and generates output tokens for all users. Both the LLM provider and the users trust Petridish's CVM and its operating system, which guarantees isolation between processes and limits their outbound network capabilities to control information flow. The CVM's attestation capability and its open-source software stack enable Petridish to provide auditable protection of both user prompt and LLM confidentiality. Together, Petridish maintains full utility of LLM service and enables practical, privacy-preserving cloud-hosted LLM inference for sensitive applications, such as processing personal data, clinical records, and financial documents.
