An Unforgeable Publicly Verifiable Watermark for Large Language Models
Aiwei Liu, Leyi Pan, Xuming Hu, Shu'ang Li, Lijie Wen, Irwin King, Philip S. Yu
TL;DR
This work tackles the challenge of publicly verifiable text watermarking for large language models by introducing UPV, a framework that separates watermark generation and detection into two neural networks while sharing token embeddings to maintain efficiency. UPV enables public detection without exposing the watermark generation key and argues for unforgeability via computational asymmetry between the detector-to-generator directions. Empirical results across GPT-2, OPT, and LLaMA-7B on multiple datasets show near-baseline detection performance with minimal false positives and negligible impact on text quality or decoding speed. The approach provides a practical, secure option for detecting machine-generated text at scale with publicly accessible detectors.
Abstract
Recently, text watermarking algorithms for large language models (LLMs) have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks. Subsequent analysis confirms the high complexity involved in forging the watermark from the detection network. Our code is available at \href{https://github.com/THU-BPM/unforgeable_watermark}{https://github.com/THU-BPM/unforgeable\_watermark}. Additionally, our algorithm could also be accessed through MarkLLM \citep{pan2024markllm} \footnote{https://github.com/THU-BPM/MarkLLM}.
