Table of Contents
Fetching ...

Remote Timing Attacks on Efficient Language Model Inference

Nicholas Carlini, Milad Nasr

TL;DR

This work shows how it is possible to learn the topic of a user's conversation with 90%+ precision on open source systems, and on production systems like OpenAI's ChatGPT and Anthropic's Claude the authors can distinguish between specific messages or infer the user's language.

Abstract

Scaling up language models has significantly increased their capabilities. But larger models are slower models, and so there is now an extensive body of work (e.g., speculative sampling or parallel decoding) that improves the (average case) efficiency of language model generation. But these techniques introduce data-dependent timing characteristics. We show it is possible to exploit these timing differences to mount a timing attack. By monitoring the (encrypted) network traffic between a victim user and a remote language model, we can learn information about the content of messages by noting when responses are faster or slower. With complete black-box access, on open source systems we show how it is possible to learn the topic of a user's conversation (e.g., medical advice vs. coding assistance) with 90%+ precision, and on production systems like OpenAI's ChatGPT and Anthropic's Claude we can distinguish between specific messages or infer the user's language. We further show that an active adversary can leverage a boosting attack to recover PII placed in messages (e.g., phone numbers or credit card numbers) for open source systems. We conclude with potential defenses and directions for future work.

Remote Timing Attacks on Efficient Language Model Inference

TL;DR

This work shows how it is possible to learn the topic of a user's conversation with 90%+ precision on open source systems, and on production systems like OpenAI's ChatGPT and Anthropic's Claude the authors can distinguish between specific messages or infer the user's language.

Abstract

Scaling up language models has significantly increased their capabilities. But larger models are slower models, and so there is now an extensive body of work (e.g., speculative sampling or parallel decoding) that improves the (average case) efficiency of language model generation. But these techniques introduce data-dependent timing characteristics. We show it is possible to exploit these timing differences to mount a timing attack. By monitoring the (encrypted) network traffic between a victim user and a remote language model, we can learn information about the content of messages by noting when responses are faster or slower. With complete black-box access, on open source systems we show how it is possible to learn the topic of a user's conversation (e.g., medical advice vs. coding assistance) with 90%+ precision, and on production systems like OpenAI's ChatGPT and Anthropic's Claude we can distinguish between specific messages or infer the user's language. We further show that an active adversary can leverage a boosting attack to recover PII placed in messages (e.g., phone numbers or credit card numbers) for open source systems. We conclude with potential defenses and directions for future work.

Paper Structure

This paper contains 60 sections, 5 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: All efficient inference methods we tested are vulnerable to timing attacks, and we can reliably distinguish between two queries with near 100% attack success rate.
  • Figure 2: Our attack reliably distinguishes between a user either asking for coding assistance or asking for medical advice across all five efficient inference strategies. When a user interacts with a model for multiple back-and-forth conversations the attack success rate grows considerably. Some methods are much more vulnerable to attack than others, with up to 100% attack success rate.
  • Figure 3: Later versions of GPT-4 implement more advanced efficient inference techniques. Whereas the June 2023 model takes just as long to answer easy versus hard queries, and the January 2024 "preview" model is 10% faster on easier queries than harder queries, the April 2024 release of the gpt-4-turbo model is twice as fast on easy queries compared to hard queries.
  • Figure 4: A network adversary can reliably distinguish between a user who has asked GPT-3.5 to either "generate the first 50 numbers" or "generate 50 random numbers". By training a Gaussian Mixture Model on the packet response times, we can achieve 94.7% accuracy at distinguishing these two queries; and can label over 50% of interactions with perfect precision.
  • Figure 5: Inter-packet delay on a token-by-token basis for two different queries, with 100 independent results overlaid.
  • ...and 7 more figures