Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Kexin Chu; Zecheng Lin; Dawei Xiang; Zixu Shen; Jianchang Su; Cheng Chu; Yiwei Yang; Wenhui Zhang; Wenfei Wu; Wei Zhang

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

Kexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang

TL;DR

This paper tackles the privacy risk introduced by global KV-cache sharing in multi-tenant LLM inference by proposing SafeKV, a system that co-designs privacy detection with cache management. It introduces a three-tier asynchronous detection pipeline, a unified radix-tree memory manager with path compression and progressive eviction, and an RDR-guided runtime safeguard to bound leakage. Evaluation shows SafeKV can reduce TTFT overhead compared to full isolation by up to 40.58% and boost throughput by up to 2.66x while preserving most cache reuse benefits. The approach provides practical, scalable privacy for multi-tenant LLM serving without sacrificing inference latency or reuse efficiency.

Abstract

Global KV-cache sharing is an effective optimization for accelerating large language model (LLM) inference, yet it introduces an API-visible timing side channel that lets adversaries infer sensitive user inputs from shared entries, leading to cross-tenant privacy risks. To address this problem, we introduce SafeKV (Secure and Flexible KV-cache Sharing), a system-level co-design of privacy enforcement and KV-cache management. SafeKV integrates lightweight detection and isolation directly into the serving runtime to eliminate cross-tenant reuse of sensitive KV-cache blocks under our threat model, while recovering most of the performance benefits of global sharing. Our key contributions are: (1) a three-tier asynchronous detection pipeline that decouples privacy classification from inference and supports streaming workloads, (2) a unified radix-tree-based memory manager with path compression and sensitivity-aware eviction for scalable selective isolation, and (3) an RDR-guided (Reuse Diversity Ratio) runtime safeguard that detects and bounds residual leakage. On large LLM backends, SafeKV reduces the time-to-first-token (TTFT) overhead compared to full isolation by up to 40.58% and raises throughput by up to 2.66x. Overall, SafeKV restores the efficiency of KV reuse while enforcing strong, practical privacy for multi-tenant LLM inference.

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

TL;DR

Abstract

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)