Provably Robust Multi-bit Watermarking for AI-generated Text

Wenjie Qu; Wengrui Zheng; Tianyang Tao; Dong Yin; Yanze Jiang; Zhihua Tian; Wei Zou; Jinyuan Jia; Jiaheng Zhang

Provably Robust Multi-bit Watermarking for AI-generated Text

Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, Yanze Jiang, Zhihua Tian, Wei Zou, Jinyuan Jia, Jiaheng Zhang

TL;DR

The paper tackles content source tracing of AI-generated text by proposing a scalable multi-bit watermarking scheme that embeds user IDs into generated text via pseudo-random segment assignment combined with Reed-Solomon error-correcting codes. It packs multiple bits into token segments, balances token allocation with dynamic programming, and provides a provable robustness bound against edits. Empirical results show high extraction accuracy (e.g., 97.6% for a 20-bit message in 200 tokens) and strong robustness under various attacks, while maintaining text quality and offering efficient extraction. The work advances practical, provable watermarking for real-world LLM deployment and includes open-source code for reproduction.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities of generating texts resembling human language. However, they can be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to address these concerns, which embeds a message (e.g., a bit string) into a text generated by an LLM. By embedding the user ID (represented as a bit string) into generated texts, we can trace generated texts to the user, known as content source tracing. The major limitation of existing watermarking techniques is that they achieve sub-optimal performance for content source tracing in real-world scenarios. The reason is that they cannot accurately or efficiently extract a long message from a generated text. We aim to address the limitations. In this work, we introduce a new watermarking method for LLM-generated text grounded in pseudo-random segment assignment. We also propose multiple techniques to further enhance the robustness of our watermarking algorithm. We conduct extensive experiments to evaluate our method. Our experimental results show that our method substantially outperforms existing baselines in both accuracy and robustness on benchmark datasets. For instance, when embedding a message of length 20 into a 200-token generated text, our method achieves a match rate of $97.6\%$, while the state-of-the-art work Yoo et al. only achieves $49.2\%$. Additionally, we prove that our watermark can tolerate edits within an edit distance of 17 on average for each paragraph under the same setting.

Provably Robust Multi-bit Watermarking for AI-generated Text

TL;DR

Abstract

, while the state-of-the-art work Yoo et al. only achieves

. Additionally, we prove that our watermark can tolerate edits within an edit distance of 17 on average for each paragraph under the same setting.

Paper Structure (26 sections, 1 theorem, 17 equations, 7 figures, 4 tables, 5 algorithms)

This paper contains 26 sections, 1 theorem, 17 equations, 7 figures, 4 tables, 5 algorithms.

Introduction
Background and Related Work
Zero-bit watermarking
Multi-bit watermarking
Background on Reed-Solomon codes
Problem Formulation
Problem definition
Design goals
Methodology
Design insights
Insights from previous works
Key ideas of our watermark design
Further improvements
Design details of multi-bit watermarking
Theoretical robustness analysis
...and 11 more sections

Key Result

Theorem 4.1

For text paragraph $S$ generated by Algorithm alg:encode before any editing. Denote embedding information $K\in \{0,1\}^{km}$ with token number $T$, error-correction code $(n, k, t)_{2^m}$, allocated token number for each segment $(c_1,\cdots,c_n)$ and green token number for each segment $(d_1,\cdot

Figures (7)

Figure 1: Outline of our watermarking application scenario and workflow. Users query the LLM with a prompt. During text generation, using our watermarking method, the service provider embeds a unique user ID into the generated text. Later, when some suspicious LLM-generated text used for malicious purposes is found, the service provider can identify and extract the watermark to trace the original user who generated the text.
Figure 2: Simplified example of pseudo-random segment assignment-based watermark embedding. *①1 Determine the index of segment to embed based on previous token. *②2 Obtain seed $s$ based on previous token and segment value. *③3 Select green list using seed $s$.
Figure 3: Match rate of our method on different datasets, LLMs, and bit lengths.
Figure 4: Robust bound of our method on different datasets, LLMs, and bit lengths.
Figure 5: Our method maintains the quality of texts generated by LLMs. The perplexity of texts generated by LLMs is similar with and without our watermark.
...and 2 more figures

Theorems & Definitions (9)

Definition 3.1: Multi-bit Watermarking
Definition 4.1
Theorem 4.1
Definition 4.2
proof
Definition A.1
Definition A.2
Definition A.3
Definition A.4

Provably Robust Multi-bit Watermarking for AI-generated Text

TL;DR

Abstract

Provably Robust Multi-bit Watermarking for AI-generated Text

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (9)