Table of Contents
Fetching ...

Yet Another Watermark for Large Language Models

Siyuan Bao, Ying Shi, Zhiguang Yang, Hanzhou Wu, Xinpeng Zhang

TL;DR

This work addresses the challenge of watermarking large language models (LLMs) without retraining and while preserving text quality. It introduces a parameter-level watermarking approach that sparsely modulates the output-layer weights via a secret key, tightly coupling the watermark with generation dynamics. Watermark presence is detected statistically by analyzing the frequency of a secretly selected token subset in generated text, enabling black-box verification using the $z$-score while maintaining perplexity ($PPL$) close to non-watermarked outputs. The method demonstrates strong detectability and robustness under common text edits, offering a practical IP-protection mechanism for LLMs with broad applicability across architectures. This framework provides a new perspective on watermarking by embedding signals into model parameters rather than post-processing, with implications for scalability and security in generative AI.

Abstract

Existing watermarking methods for large language models (LLMs) mainly embed watermark by adjusting the token sampling prediction or post-processing, lacking intrinsic coupling with LLMs, which may significantly reduce the semantic quality of the generated marked texts. Traditional watermarking methods based on training or fine-tuning may be extendable to LLMs. However, most of them are limited to the white-box scenario, or very time-consuming due to the massive parameters of LLMs. In this paper, we present a new watermarking framework for LLMs, where the watermark is embedded into the LLM by manipulating the internal parameters of the LLM, and can be extracted from the generated text without accessing the LLM. Comparing with related methods, the proposed method entangles the watermark with the intrinsic parameters of the LLM, which better balances the robustness and imperceptibility of the watermark. Moreover, the proposed method enables us to extract the watermark under the black-box scenario, which is computationally efficient for use. Experimental results have also verified the feasibility, superiority and practicality. This work provides a new perspective different from mainstream works, which may shed light on future research.

Yet Another Watermark for Large Language Models

TL;DR

This work addresses the challenge of watermarking large language models (LLMs) without retraining and while preserving text quality. It introduces a parameter-level watermarking approach that sparsely modulates the output-layer weights via a secret key, tightly coupling the watermark with generation dynamics. Watermark presence is detected statistically by analyzing the frequency of a secretly selected token subset in generated text, enabling black-box verification using the -score while maintaining perplexity () close to non-watermarked outputs. The method demonstrates strong detectability and robustness under common text edits, offering a practical IP-protection mechanism for LLMs with broad applicability across architectures. This framework provides a new perspective on watermarking by embedding signals into model parameters rather than post-processing, with implications for scalability and security in generative AI.

Abstract

Existing watermarking methods for large language models (LLMs) mainly embed watermark by adjusting the token sampling prediction or post-processing, lacking intrinsic coupling with LLMs, which may significantly reduce the semantic quality of the generated marked texts. Traditional watermarking methods based on training or fine-tuning may be extendable to LLMs. However, most of them are limited to the white-box scenario, or very time-consuming due to the massive parameters of LLMs. In this paper, we present a new watermarking framework for LLMs, where the watermark is embedded into the LLM by manipulating the internal parameters of the LLM, and can be extracted from the generated text without accessing the LLM. Comparing with related methods, the proposed method entangles the watermark with the intrinsic parameters of the LLM, which better balances the robustness and imperceptibility of the watermark. Moreover, the proposed method enables us to extract the watermark under the black-box scenario, which is computationally efficient for use. Experimental results have also verified the feasibility, superiority and practicality. This work provides a new perspective different from mainstream works, which may shed light on future research.

Paper Structure

This paper contains 12 sections, 5 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: The text quality (PPL) and detection performance ($z$-score) due to different $\alpha$. Here, we use $\gamma = 0.5$.
  • Figure 2: The text quality (PPL) and detection performance ($z$-score) due to different $\gamma$. Here, we use $\alpha = 1.1$.
  • Figure 3: The robustness of the watermark against common text-level attacks. Here, we use $\alpha = 1.1$ and $\gamma = 0.5$.