Table of Contents
Fetching ...

Multi-Bit Distortion-Free Watermarking for Large Language Models

Massieh Kordi Boroujeny, Ya Jiang, Kai Zeng, Brian Mark

TL;DR

This work advances LLM watermarking by enabling multi-bit distortion-free embeddings that preserve the original output distribution. It builds on zero-bit distortion-free methods, introducing a Distribution Interval Shift Coding (DISC) framework that embeds multiple bits via a multi-bit watermarking mapping rule and a PRF-based randomness source. The proposed DISC encoder/decoder achieves low bit error rates with efficient decoding and provides analyses for detection thresholds and required watermark length under false positive/false negative constraints. The approach enhances content attribution and forensic capabilities for AI-generated text while maintaining text quality, with practical implications for secure, accountable AI usage.

Abstract

Methods for watermarking large language models have been proposed that distinguish AI-generated text from human-generated text by slightly altering the model output distribution, but they also distort the quality of the text, exposing the watermark to adversarial detection. More recently, distortion-free watermarking methods were proposed that require a secret key to detect the watermark. The prior methods generally embed zero-bit watermarks that do not provide additional information beyond tagging a text as being AI-generated. We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark. We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.

Multi-Bit Distortion-Free Watermarking for Large Language Models

TL;DR

This work advances LLM watermarking by enabling multi-bit distortion-free embeddings that preserve the original output distribution. It builds on zero-bit distortion-free methods, introducing a Distribution Interval Shift Coding (DISC) framework that embeds multiple bits via a multi-bit watermarking mapping rule and a PRF-based randomness source. The proposed DISC encoder/decoder achieves low bit error rates with efficient decoding and provides analyses for detection thresholds and required watermark length under false positive/false negative constraints. The approach enhances content attribution and forensic capabilities for AI-generated text while maintaining text quality, with practical implications for secure, accountable AI usage.

Abstract

Methods for watermarking large language models have been proposed that distinguish AI-generated text from human-generated text by slightly altering the model output distribution, but they also distort the quality of the text, exposing the watermark to adversarial detection. More recently, distortion-free watermarking methods were proposed that require a secret key to detect the watermark. The prior methods generally embed zero-bit watermarks that do not provide additional information beyond tagging a text as being AI-generated. We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark. We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.
Paper Structure (26 sections, 4 theorems, 115 equations, 15 figures, 1 table, 8 algorithms)

This paper contains 26 sections, 4 theorems, 115 equations, 15 figures, 1 table, 8 algorithms.

Key Result

Proposition 2.5

A watermarking algorithm following a watermarking mapping rule $\Gamma_t(\Omega, \mathcal{V}; \mathsf{P}_{\boldsymbol{y}})$ is distortion-free if and only if for every prompt $\alpha$ and the past generated tokens $W_{[t-1]}$,

Figures (15)

  • Figure 1: Watermarking mapping rule $\Gamma(\Omega, \mathcal{V})$.
  • Figure 2: Watermarking mapping rule in Christ2023.
  • Figure 3: Exact and approximate (dashed lines) $L_{\min}$, $|\mathcal{V}|= 50272$.
  • Figure 4: Multi-bit watermarking mapping rule in DISC.
  • Figure 5: BER when extracting bits with different length from the text with various length.
  • ...and 10 more figures

Theorems & Definitions (9)

  • Definition 2.1: Language Model
  • Definition 2.2
  • Definition 2.3: PseudoRandom Functions (PRF)
  • Definition 2.4: Watermarking mapping rule
  • Proposition 2.5
  • Definition 4.1: Multi-bit watermarking mapping rule
  • Proposition 4.2
  • Theorem 2.1: Berry-Esseén Theorem
  • Lemma 3.1