Table of Contents
Fetching ...

Duwak: Dual Watermarks in Large Language Models

Chaoyi Zhu, Jeroen Galjaard, Pin-Yu Chen, Lydia Y. Chen

TL;DR

Duwak tackles the challenge of efficiently auditing and governing LLM-generated text by introducing dual watermarking that embeds signals in both the token probability distribution and the sampling process. The token-probability watermark uses a secret green-red split with logit bias, while the sampling watermark employs a contrastive-search mechanism with a sliding window to preserve diversity and minimize repetition. A Fisher-based fusion of two independent p-values enables robust detection, and a theoretical bound shows the watermarks interact in a controlled, separable manner. Empirically, on Llama2-7b and Vicuna-7b-v1.5, Duwak delivers high text-quality metrics and requires significantly fewer tokens for detection than prior single-watermark methods, even under strong post-editing attacks, demonstrating practical benefits for governance and accountability of generated content.

Abstract

As large language models (LLM) are increasingly used for text generation tasks, it is critical to audit their usages, govern their applications, and mitigate their potential harms. Existing watermark techniques are shown effective in embedding single human-imperceptible and machine-detectable patterns without significantly affecting generated text quality and semantics. However, the efficiency in detecting watermarks, i.e., the minimum number of tokens required to assert detection with significance and robustness against post-editing, is still debatable. In this paper, we propose, Duwak, to fundamentally enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes. To mitigate expression degradation caused by biasing toward certain tokens, we design a contrastive search to watermark the sampling scheme, which minimizes the token repetition and enhances the diversity. We theoretically explain the interdependency of the two watermarks within Duwak. We evaluate Duwak extensively on Llama2 under various post-editing attacks, against four state-of-the-art watermarking techniques and combinations of them. Our results show that Duwak marked text achieves the highest watermarked text quality at the lowest required token count for detection, up to 70% tokens less than existing approaches, especially under post paraphrasing.

Duwak: Dual Watermarks in Large Language Models

TL;DR

Duwak tackles the challenge of efficiently auditing and governing LLM-generated text by introducing dual watermarking that embeds signals in both the token probability distribution and the sampling process. The token-probability watermark uses a secret green-red split with logit bias, while the sampling watermark employs a contrastive-search mechanism with a sliding window to preserve diversity and minimize repetition. A Fisher-based fusion of two independent p-values enables robust detection, and a theoretical bound shows the watermarks interact in a controlled, separable manner. Empirically, on Llama2-7b and Vicuna-7b-v1.5, Duwak delivers high text-quality metrics and requires significantly fewer tokens for detection than prior single-watermark methods, even under strong post-editing attacks, demonstrating practical benefits for governance and accountability of generated content.

Abstract

As large language models (LLM) are increasingly used for text generation tasks, it is critical to audit their usages, govern their applications, and mitigate their potential harms. Existing watermark techniques are shown effective in embedding single human-imperceptible and machine-detectable patterns without significantly affecting generated text quality and semantics. However, the efficiency in detecting watermarks, i.e., the minimum number of tokens required to assert detection with significance and robustness against post-editing, is still debatable. In this paper, we propose, Duwak, to fundamentally enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes. To mitigate expression degradation caused by biasing toward certain tokens, we design a contrastive search to watermark the sampling scheme, which minimizes the token repetition and enhances the diversity. We theoretically explain the interdependency of the two watermarks within Duwak. We evaluate Duwak extensively on Llama2 under various post-editing attacks, against four state-of-the-art watermarking techniques and combinations of them. Our results show that Duwak marked text achieves the highest watermarked text quality at the lowest required token count for detection, up to 70% tokens less than existing approaches, especially under post paraphrasing.
Paper Structure (22 sections, 28 equations, 5 figures, 6 tables, 2 algorithms)

This paper contains 22 sections, 28 equations, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 2: Rating v.s. token efficiency under different watermarking methods and hyper-parameter settings for different detection $p$-values.
  • Figure 3: Detection efficiency ($\downarrow$) of Duwak and KGW with equal hyper-config under varying $\delta$.
  • Figure 4: Comparative analysis of Duwak and KGW with identical hyper-parameters under varying $\delta$, detection efficiency ($\downarrow$).
  • Figure 5: Comparison of empirical false positive rate and theoretical false positive rate for different watermarks
  • Figure 6: Detection efficiency vs. rating under different watermarking methods and hyper-parameter settings with p-value $0.01$ and $0.01$. Arrows are drawn between the corresponding configurations with different p-values to indicate the reduction of detection tokens required for a more lenient p-value.

Theorems & Definitions (2)

  • proof
  • proof