InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks
Xinyao Zheng, Husheng Han, Shangyi Shi, Qiyan Fang, Zidong Du, Xing Hu, Qi Guo
TL;DR
InputSnatch demonstrates that common cache-sharing optimizations in LLM inference create practical timing side channels capable of reconstructing private user prompts. The authors develop a dual-pronged attack framework—Input Constructor and Time Analyzer—that exploit prefix and semantic caching to recover inputs with high accuracy under realistic constraints. Extensive experiments on vLLM and GPTCache show robust timing patterns enabling field-level recovery and semantic-content leakage across medical and legal domains, underscoring privacy risks in cloud-based inference. The work highlights a critical trade-off between performance optimization and privacy, and proposes defenses such as per-user cache isolation, rate limiting, and timing obfuscation to mitigate these vulnerabilities. Overall, the paper offers a thorough analysis of cache-based timing leaks in LLM services and provides concrete guidance for securing production deployments against such side-channel threats.
Abstract
Large language models (LLMs) possess extensive knowledge and question-answering capabilities, having been widely deployed in privacy-sensitive domains like finance and medical consultation. During LLM inferences, cache-sharing methods are commonly employed to enhance efficiency by reusing cached states or responses for the same or similar inference requests. However, we identify that these cache mechanisms pose a risk of private input leakage, as the caching can result in observable variations in response times, making them a strong candidate for a timing-based attack hint. In this study, we propose a novel timing-based side-channel attack to execute input theft in LLMs inference. The cache-based attack faces the challenge of constructing candidate inputs in a large search space to hit and steal cached user queries. To address these challenges, we propose two primary components. The input constructor employs machine learning techniques and LLM-based approaches for vocabulary correlation learning while implementing optimized search mechanisms for generalized input construction. The time analyzer implements statistical time fitting with outlier elimination to identify cache hit patterns, continuously providing feedback to refine the constructor's search strategy. We conduct experiments across two cache mechanisms and the results demonstrate that our approach consistently attains high attack success rates in various applications. Our work highlights the security vulnerabilities associated with performance optimizations, underscoring the necessity of prioritizing privacy and security alongside enhancements in LLM inference.
