Multi-Bit Distortion-Free Watermarking for Large Language Models

Massieh Kordi Boroujeny; Ya Jiang; Kai Zeng; Brian Mark

Multi-Bit Distortion-Free Watermarking for Large Language Models

Massieh Kordi Boroujeny, Ya Jiang, Kai Zeng, Brian Mark

TL;DR

This work advances LLM watermarking by enabling multi-bit distortion-free embeddings that preserve the original output distribution. It builds on zero-bit distortion-free methods, introducing a Distribution Interval Shift Coding (DISC) framework that embeds multiple bits via a multi-bit watermarking mapping rule and a PRF-based randomness source. The proposed DISC encoder/decoder achieves low bit error rates with efficient decoding and provides analyses for detection thresholds and required watermark length under false positive/false negative constraints. The approach enhances content attribution and forensic capabilities for AI-generated text while maintaining text quality, with practical implications for secure, accountable AI usage.

Abstract

Methods for watermarking large language models have been proposed that distinguish AI-generated text from human-generated text by slightly altering the model output distribution, but they also distort the quality of the text, exposing the watermark to adversarial detection. More recently, distortion-free watermarking methods were proposed that require a secret key to detect the watermark. The prior methods generally embed zero-bit watermarks that do not provide additional information beyond tagging a text as being AI-generated. We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark. We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.

Multi-Bit Distortion-Free Watermarking for Large Language Models

TL;DR

Abstract

Paper Structure (26 sections, 4 theorems, 115 equations, 15 figures, 1 table, 8 algorithms)

This paper contains 26 sections, 4 theorems, 115 equations, 15 figures, 1 table, 8 algorithms.

Introduction
Preliminaries
Zero-bit Distortion-free Watermarking
Binarization of language model
Watermarking without random initialization
Watermarking with random initialization
Multi-Bit Distortion-free Watermarking
Experiments
Conclusion
Preliminaries
Entropy of a language model
Watermarking mapping rule
Zero-bit Distortion-Free Watermarking
Binarization of language models
Watermarking without random initialization
...and 11 more sections

Key Result

Proposition 2.5

A watermarking algorithm following a watermarking mapping rule $\Gamma_t(\Omega, \mathcal{V}; \mathsf{P}_{\boldsymbol{y}})$ is distortion-free if and only if for every prompt $\alpha$ and the past generated tokens $W_{[t-1]}$,

Figures (15)

Figure 1: Watermarking mapping rule $\Gamma(\Omega, \mathcal{V})$.
Figure 2: Watermarking mapping rule in Christ2023.
Figure 3: Exact and approximate (dashed lines) $L_{\min}$, $|\mathcal{V}|= 50272$.
Figure 4: Multi-bit watermarking mapping rule in DISC.
Figure 5: BER when extracting bits with different length from the text with various length.
...and 10 more figures

Theorems & Definitions (9)

Definition 2.1: Language Model
Definition 2.2
Definition 2.3: PseudoRandom Functions (PRF)
Definition 2.4: Watermarking mapping rule
Proposition 2.5
Definition 4.1: Multi-bit watermarking mapping rule
Proposition 4.2
Theorem 2.1: Berry-Esseén Theorem
Lemma 3.1

Multi-Bit Distortion-Free Watermarking for Large Language Models

TL;DR

Abstract

Multi-Bit Distortion-Free Watermarking for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (9)