Table of Contents
Fetching ...

Excuse me, sir? Your language model is leaking (information)

Or Zamir

TL;DR

This work introduces a cryptographic method to hide an arbitrary secret payload in the response of a Large Language Model (LLM) and finds that the quality of generated text is not affected by the payload.

Abstract

We introduce a cryptographic method to hide an arbitrary secret payload in the response of a Large Language Model (LLM). A secret key is required to extract the payload from the model's response, and without the key it is provably impossible to distinguish between the responses of the original LLM and the LLM that hides a payload. In particular, the quality of generated text is not affected by the payload. Our approach extends a recent result of Christ, Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for LLMs.

Excuse me, sir? Your language model is leaking (information)

TL;DR

This work introduces a cryptographic method to hide an arbitrary secret payload in the response of a Large Language Model (LLM) and finds that the quality of generated text is not affected by the payload.

Abstract

We introduce a cryptographic method to hide an arbitrary secret payload in the response of a Large Language Model (LLM). A secret key is required to extract the payload from the model's response, and without the key it is provably impossible to distinguish between the responses of the original LLM and the LLM that hides a payload. In particular, the quality of generated text is not affected by the payload. Our approach extends a recent result of Christ, Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for LLMs.
Paper Structure (16 sections, 6 theorems, 21 equations, 3 figures, 6 algorithms)

This paper contains 16 sections, 6 theorems, 21 equations, 3 figures, 6 algorithms.

Key Result

Theorem 1

Fix a model $\mathsf{Model}$. Let $\textsc{prompt},\textsc{payload}$ be strings. Conditioned on the empirical entropy of a response $y$ generated by $\mathsf{Steg}_k(\textsc{prompt},\textsc{payload})$ being high enough, the expected length of the prefixes of $\textsc{payload}$ and $\mathsf{Retr}_k(y

Figures (3)

  • Figure 1: We asked Llamma 2 to write an email urging a professor for an easy exam, intended to be sent anonymously. Nevertheless, the part of the response shown above secretly encodes the initials of the user who used the LLM. This secret payload was encoded without modifying the response distribution at all.
  • Figure 2: Plot of the number of successfully hidden payload bits, by length of response. Experiments ran on GPT-2 with a random choice of an example prompt taken from the OpenAI website. The experiment was performed 100 times for each response length.
  • Figure 3: A breakdown of the decoding algorithm for the example in Figure \ref{['fig:main']}.

Theorems & Definitions (20)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5: Steganography Scheme
  • Definition 2.6: Undetectability
  • Theorem : Informal version of Theorem \ref{['thm:main']}
  • Definition 5.1
  • Definition 5.2
  • Theorem 5.3
  • ...and 10 more