Multi-Bit Distortion-Free Watermarking for Large Language Models
Massieh Kordi Boroujeny, Ya Jiang, Kai Zeng, Brian Mark
TL;DR
This work advances LLM watermarking by enabling multi-bit distortion-free embeddings that preserve the original output distribution. It builds on zero-bit distortion-free methods, introducing a Distribution Interval Shift Coding (DISC) framework that embeds multiple bits via a multi-bit watermarking mapping rule and a PRF-based randomness source. The proposed DISC encoder/decoder achieves low bit error rates with efficient decoding and provides analyses for detection thresholds and required watermark length under false positive/false negative constraints. The approach enhances content attribution and forensic capabilities for AI-generated text while maintaining text quality, with practical implications for secure, accountable AI usage.
Abstract
Methods for watermarking large language models have been proposed that distinguish AI-generated text from human-generated text by slightly altering the model output distribution, but they also distort the quality of the text, exposing the watermark to adversarial detection. More recently, distortion-free watermarking methods were proposed that require a secret key to detect the watermark. The prior methods generally embed zero-bit watermarks that do not provide additional information beyond tagging a text as being AI-generated. We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark. We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.
