Table of Contents
Fetching ...

Time Will Tell: Timing Side Channels via Output Token Count in Large Language Models

Tianchen Zhang, Gururaj Saileshwar, David Lie

TL;DR

This work introduces a novel timing side-channel in large language models based on output token counts, exploiting the autoregressive decoding process and tokenizer biases to infer private attributes such as the translation target language and classification outputs. By constructing offline 2D profiles using Output Token Density and Output-Input Ratio and applying Gaussian Mixture Models, the authors achieve high attack success rates (averages around the low-to-mid 80s percent) across multiple models and tasks, including end-to-end network scenarios. They demonstrate translation-language leakage with ~83% ASR across Tower, M2M100, and MBart50, and classification leakage across open-source and production models, with GPT-4o showing strong leakage in some settings. The paper also explores end-to-end timing attacks when exact token counts are unavailable and discusses tokenizer-, prompt-, and system-level mitigations, emphasizing the practical privacy risks posed by such timing signals in real-world LLM deployments.

Abstract

This paper demonstrates a new side-channel that enables an adversary to extract sensitive information about inference inputs in large language models (LLMs) based on the number of output tokens in the LLM response. We construct attacks using this side-channel in two common LLM tasks: recovering the target language in machine translation tasks and recovering the output class in classification tasks. In addition, due to the auto-regressive generation mechanism in LLMs, an adversary can recover the output token count reliably using a timing channel, even over the network against a popular closed-source commercial LLM. Our experiments show that an adversary can learn the output language in translation tasks with more than 75% precision across three different models (Tower, M2M100, MBart50). Using this side-channel, we also show the input class in text classification tasks can be leaked out with more than 70% precision from open-source LLMs like Llama-3.1, Llama-3.2, Gemma2, and production models like GPT-4o. Finally, we propose tokenizer-, system-, and prompt-based mitigations against the output token count side-channel.

Time Will Tell: Timing Side Channels via Output Token Count in Large Language Models

TL;DR

This work introduces a novel timing side-channel in large language models based on output token counts, exploiting the autoregressive decoding process and tokenizer biases to infer private attributes such as the translation target language and classification outputs. By constructing offline 2D profiles using Output Token Density and Output-Input Ratio and applying Gaussian Mixture Models, the authors achieve high attack success rates (averages around the low-to-mid 80s percent) across multiple models and tasks, including end-to-end network scenarios. They demonstrate translation-language leakage with ~83% ASR across Tower, M2M100, and MBart50, and classification leakage across open-source and production models, with GPT-4o showing strong leakage in some settings. The paper also explores end-to-end timing attacks when exact token counts are unavailable and discusses tokenizer-, prompt-, and system-level mitigations, emphasizing the practical privacy risks posed by such timing signals in real-world LLM deployments.

Abstract

This paper demonstrates a new side-channel that enables an adversary to extract sensitive information about inference inputs in large language models (LLMs) based on the number of output tokens in the LLM response. We construct attacks using this side-channel in two common LLM tasks: recovering the target language in machine translation tasks and recovering the output class in classification tasks. In addition, due to the auto-regressive generation mechanism in LLMs, an adversary can recover the output token count reliably using a timing channel, even over the network against a popular closed-source commercial LLM. Our experiments show that an adversary can learn the output language in translation tasks with more than 75% precision across three different models (Tower, M2M100, MBart50). Using this side-channel, we also show the input class in text classification tasks can be leaked out with more than 70% precision from open-source LLMs like Llama-3.1, Llama-3.2, Gemma2, and production models like GPT-4o. Finally, we propose tokenizer-, system-, and prompt-based mitigations against the output token count side-channel.

Paper Structure

This paper contains 51 sections, 6 equations, 17 figures, 11 tables.

Figures (17)

  • Figure 1: Example of our side-channel attacks on text sentiment classification with an LLM. (a) For a given text input, the LLM outputs the classified sentiment and also provides an explanation that can vary in length. (b) Our token-count side-channel attack leaks the output class (sentiment) based on the bias in the output token counts, or by measuring the execution time, as a proxy for the output token count.
  • Figure 2: Threat model. We assume the attacker is network-based, and cannot inspect the contents of the encrypted input or output, but can monitor the overall response time, and length of the entire encrypted input or output.
  • Figure 3: Overview of Attack on Translation. A user translates text in a source language (e.g., English) to any target language; the goal of the attack is to leak this target language. By profiling output token density and output/input bytes ratio for different languages offline, and by observing these for a given translation, the attacker can leak the user's target language.
  • Figure 4: Profile using a Gaussian Mixture Model on a 2D decision space for the Tower model with the source language as English.
  • Figure 5: Attack success rates (ASR) for the attack leaking the output languages for different translation models, using source language as English.
  • ...and 12 more figures