Excuse me, sir? Your language model is leaking (information)

Or Zamir

Excuse me, sir? Your language model is leaking (information)

Or Zamir

TL;DR

This work introduces a cryptographic method to hide an arbitrary secret payload in the response of a Large Language Model (LLM) and finds that the quality of generated text is not affected by the payload.

Abstract

We introduce a cryptographic method to hide an arbitrary secret payload in the response of a Large Language Model (LLM). A secret key is required to extract the payload from the model's response, and without the key it is provably impossible to distinguish between the responses of the original LLM and the LLM that hides a payload. In particular, the quality of generated text is not affected by the payload. Our approach extends a recent result of Christ, Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for LLMs.

Excuse me, sir? Your language model is leaking (information)

TL;DR

Abstract

Paper Structure (16 sections, 6 theorems, 21 equations, 3 figures, 6 algorithms)

This paper contains 16 sections, 6 theorems, 21 equations, 3 figures, 6 algorithms.

Introduction
Organization of the paper
Model and Preliminaries
Preliminaries
Pseudorandom function (PRF).
Language Models
Entropy and Empirical Entropy
Empirical Entropy in Natural Language
Steganography for LLMs
Overview of the CGZ Watermark
High-Level Overview of Our Scheme
Dynamic Error Correcting Code
Our Scheme
Complete Undetectability
Empirical Evaluation
...and 1 more sections

Key Result

Theorem 1

Fix a model $\mathsf{Model}$. Let $\textsc{prompt},\textsc{payload}$ be strings. Conditioned on the empirical entropy of a response $y$ generated by $\mathsf{Steg}_k(\textsc{prompt},\textsc{payload})$ being high enough, the expected length of the prefixes of $\textsc{payload}$ and $\mathsf{Retr}_k(y

Figures (3)

Figure 1: We asked Llamma 2 to write an email urging a professor for an easy exam, intended to be sent anonymously. Nevertheless, the part of the response shown above secretly encodes the initials of the user who used the LLM. This secret payload was encoded without modifying the response distribution at all.
Figure 2: Plot of the number of successfully hidden payload bits, by length of response. Experiments ran on GPT-2 with a random choice of an example prompt taken from the OpenAI website. The experiment was performed 100 times for each response length.
Figure 3: A breakdown of the decoding algorithm for the example in Figure \ref{['fig:main']}.

Theorems & Definitions (20)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Definition 2.5: Steganography Scheme
Definition 2.6: Undetectability
Theorem : Informal version of Theorem \ref{['thm:main']}
Definition 5.1
Definition 5.2
Theorem 5.3
...and 10 more

Excuse me, sir? Your language model is leaking (information)

TL;DR

Abstract

Excuse me, sir? Your language model is leaking (information)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (20)