Table of Contents
Fetching ...

What Was Your Prompt? A Remote Keylogging Attack on AI Assistants

Roy Weiss, Daniel Ayzenshteyn, Guy Amit, Yisroel Mirsky

TL;DR

The paper identifies a token-length side-channel in real-time AI assistant traffic, formalizing it with the token-length sequence $T=[t_1,t_2,...,t_n]$ where $t_i=|r_i|$ and aims to recover the plaintext $R=[r_1,...,r_n]$ from encrypted channels. It introduces a two-LLM inference framework that translates $T$ into text, using forward context and a known-plaintext attack to reduce entropy and improve accuracy. The authors demonstrate substantial leakage across OpenAI ChatGPT-4 and Microsoft Copilot, achieving up to $29\%$ reconstruction accuracy and $\phi>0.5$ topic-exposure on $>54\%$ of first segments, with notable transferability between assistants. These findings underscore significant privacy risks and motivate mitigations such as padding, grouping, and batching to limit information leakage while balancing user experience and bandwidth constraints.

Abstract

AI assistants are becoming an integral part of society, used for asking advice or help in personal and confidential issues. In this paper, we unveil a novel side-channel that can be used to read encrypted responses from AI Assistants over the web: the token-length side-channel. We found that many vendors, including OpenAI and Microsoft, have this side-channel. However, inferring the content of a response from a token-length sequence alone proves challenging. This is because tokens are akin to words, and responses can be several sentences long leading to millions of grammatically correct sentences. In this paper, we show how this can be overcome by (1) utilizing the power of a large language model (LLM) to translate these sequences, (2) providing the LLM with inter-sentence context to narrow the search space and (3) performing a known-plaintext attack by fine-tuning the model on the target model's writing style. Using these methods, we were able to accurately reconstruct 29\% of an AI assistant's responses and successfully infer the topic from 55\% of them. To demonstrate the threat, we performed the attack on OpenAI's ChatGPT-4 and Microsoft's Copilot on both browser and API traffic.

What Was Your Prompt? A Remote Keylogging Attack on AI Assistants

TL;DR

The paper identifies a token-length side-channel in real-time AI assistant traffic, formalizing it with the token-length sequence where and aims to recover the plaintext from encrypted channels. It introduces a two-LLM inference framework that translates into text, using forward context and a known-plaintext attack to reduce entropy and improve accuracy. The authors demonstrate substantial leakage across OpenAI ChatGPT-4 and Microsoft Copilot, achieving up to reconstruction accuracy and topic-exposure on of first segments, with notable transferability between assistants. These findings underscore significant privacy risks and motivate mitigations such as padding, grouping, and batching to limit information leakage while balancing user experience and bandwidth constraints.

Abstract

AI assistants are becoming an integral part of society, used for asking advice or help in personal and confidential issues. In this paper, we unveil a novel side-channel that can be used to read encrypted responses from AI Assistants over the web: the token-length side-channel. We found that many vendors, including OpenAI and Microsoft, have this side-channel. However, inferring the content of a response from a token-length sequence alone proves challenging. This is because tokens are akin to words, and responses can be several sentences long leading to millions of grammatically correct sentences. In this paper, we show how this can be overcome by (1) utilizing the power of a large language model (LLM) to translate these sequences, (2) providing the LLM with inter-sentence context to narrow the search space and (3) performing a known-plaintext attack by fine-tuning the model on the target model's writing style. Using these methods, we were able to accurately reconstruct 29\% of an AI assistant's responses and successfully infer the topic from 55\% of them. To demonstrate the threat, we performed the attack on OpenAI's ChatGPT-4 and Microsoft's Copilot on both browser and API traffic.
Paper Structure (25 sections, 1 equation, 12 figures, 3 tables)

This paper contains 25 sections, 1 equation, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Overview of the attack. A packet capture of an AI assistant's real-time response reveals a token-sequence side-channel. The side-channel is parsed to find text segments which are then reconstructed using sentence-level context and knowledge of the target LLM's writing style.
  • Figure 2: An overview of the attack framework: (1) Encrypted traffic is intercepted and then (2) the start of the response is identified. Then (3) the token-length sequence $T$ is extracted and (4) a heuristic is used to partition $T$ into ordered segments ($T_0, T_1, ...$). Finally, (5) each segment is used to infer the text of the response. This is done by (A) using two specialized LLMs to predict each segment sequentially based on prior outputs, (B) generating multiple options for each segment and selecting the best (most confident) result, and (C) resolving the predicted response $\hat{R}$ by concatenating the best segments together.
  • Figure 3: An example showing the trends in the encrypted traffic traffic before and after performing message identification. This example is taken from a response sent from OpenAI's ChatGPT-4 web app. Red bars indicate the start and end of the messages.
  • Figure 4: A sample of attack successes and failures on $R_0$. We consider a cosine similarity of $\phi>0.5$ a successful attack.
  • Figure 5: The performance distribution for 10k first segments. Red indicates the case where top ranked result is selected as $\hat{R}_0$. Other colors indicate what the performance could have been when selecting the 'ideal' sample from among the top results.
  • ...and 7 more figures