REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, Farinaz Koushanfar
TL;DR
REMARK-LLM presents a robust, efficient framework to watermark LLM-generated text by embedding binary signatures into the model's output distribution via a learning-based message encoding, a differentiable reparameterization to a sparse token distribution, and a transformer-based decoding module for signature extraction. An optimized beam search during insertion preserves linguistic coherence while enabling reliable watermark retrieval, even under malicious transformations seen in real-world scenarios. End-to-end training minimizes semantic loss and maximizes watermark recoverability, while incorporating transformations to harden against removal and detection attacks. Across multiple datasets and unseen LLM architectures, REMARK-LLM achieves up to 2$ imes$ more embedded bits with strong semantic fidelity, transferability, and resilience, suggesting practical applicability for IP protection, anti-plagiarism, and misinformation tracing in real-world deployments.
Abstract
We present REMARK-LLM, a novel efficient, and robust watermarking framework designed for texts generated by large language models (LLMs). Synthesizing human-like content using LLMs necessitates vast computational resources and extensive datasets, encapsulating critical intellectual property (IP). However, the generated content is prone to malicious exploitation, including spamming and plagiarism. To address the challenges, REMARK-LLM proposes three new components: (i) a learning-based message encoding module to infuse binary signatures into LLM-generated texts; (ii) a reparameterization module to transform the dense distributions from the message encoding to the sparse distribution of the watermarked textual tokens; (iii) a decoding module dedicated for signature extraction; Furthermore, we introduce an optimized beam search algorithm to guarantee the coherence and consistency of the generated content. REMARK-LLM is rigorously trained to encourage the preservation of semantic integrity in watermarked content, while ensuring effective watermark retrieval. Extensive evaluations on multiple unseen datasets highlight REMARK-LLM proficiency and transferability in inserting 2 times more signature bits into the same texts when compared to prior art, all while maintaining semantic integrity. Furthermore, REMARK-LLM exhibits better resilience against a spectrum of watermark detection and removal attacks.
