An Entropy-based Text Watermarking Detection Method

Yijian Lu; Aiwei Liu; Dianzhi Yu; Jingjing Li; Irwin King

An Entropy-based Text Watermarking Detection Method

Yijian Lu, Aiwei Liu, Dianzhi Yu, Jingjing Li, Irwin King

TL;DR

Large language models enable easy generation of indistinguishable text, creating a need for robust watermark detection. The paper proposes Entropy-based Text Watermark Detection (EWD), which assigns token weights according to entropy to improve detection in low-entropy texts while remaining training-free. Theoretical analysis shows EWD maintains type-I error and lowers type-II error in low-entropy scenarios; experiments across code and text tasks demonstrate improved detection, with competitive performance in high-entropy cases and resilience to back-translation. The approach is general, automated, and adaptable to different entropy distributions.

Abstract

Text watermarking algorithms for large language models (LLMs) can effectively identify machine-generated texts by embedding and detecting hidden features in the text. Although the current text watermarking algorithms perform well in most high-entropy scenarios, its performance in low-entropy scenarios still needs to be improved. In this work, we opine that the influence of token entropy should be fully considered in the watermark detection process, $i.e.$, the weight of each token during watermark detection should be customized according to its entropy, rather than setting the weights of all tokens to the same value as in previous methods. Specifically, we propose \textbf{E}ntropy-based Text \textbf{W}atermarking \textbf{D}etection (\textbf{EWD}) that gives higher-entropy tokens higher influence weights during watermark detection, so as to better reflect the degree of watermarking. Furthermore, the proposed detection process is training-free and fully automated. From the experiments, we demonstrate that our EWD can achieve better detection performance in low-entropy scenarios, and our method is also general and can be applied to texts with different entropy distributions. Our code and data is available\footnote{\url{https://github.com/luyijian3/EWD}}. Additionally, our algorithm could be accessed through MarkLLM \cite{pan2024markllm}\footnote{\url{https://github.com/THU-BPM/MarkLLM}}.

An Entropy-based Text Watermarking Detection Method

TL;DR

Abstract

, the weight of each token during watermark detection should be customized according to its entropy, rather than setting the weights of all tokens to the same value as in previous methods. Specifically, we propose \textbf{E}ntropy-based Text \textbf{W}atermarking \textbf{D}etection (\textbf{EWD}) that gives higher-entropy tokens higher influence weights during watermark detection, so as to better reflect the degree of watermarking. Furthermore, the proposed detection process is training-free and fully automated. From the experiments, we demonstrate that our EWD can achieve better detection performance in low-entropy scenarios, and our method is also general and can be applied to texts with different entropy distributions. Our code and data is available\footnote{\url{https://github.com/luyijian3/EWD}}. Additionally, our algorithm could be accessed through MarkLLM \cite{pan2024markllm}\footnote{\url{https://github.com/THU-BPM/MarkLLM}}.

Paper Structure (18 sections, 9 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 18 sections, 9 equations, 3 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Text Generation Process of LLMs
Text Watermarking
Token Entropy and Low-entropy Scenario
Proposed Method
Motivation
Entropy-based Text Watermarking Detection
Theoretical Analysis
Type-I Error
Type-II Error
Experiments
Experiment Settings
Main Results
...and 3 more sections

Figures (3)

Figure 1: This figure shows that compared with watermarked texts with mostly high-entropy tokens, watermarked codes with mostly low-entropy tokens see significantly less green tokens, resulting in a small detection z-score. Furthermore, on the bottom of the figure, we demonstrate that the green token ratio in tokens decreases as their entropy decreases.
Figure 2: Subfigure (a) shows the z-scores of watermarked and human texts in the Rotowire, HumanEval and MBPP datasets, respectively, each being detected with 3 different methods. Subfigure (b) shows the relationship between token weights and the probability of being green in both watermarked and human texts.
Figure 3: We utilize two additional weight functions other than Linear and measured their performance in the code detection datasets. Subfigure (a) is a visualization of all studied functions, with normalized spike entropy as input. Subfigure (b) shows each function's detection F1 under 1% FPR with comparison to the SWEET baseline. Each data point can correspond to the illustrated function on the left figure by looking at its shape and x-axis value.

An Entropy-based Text Watermarking Detection Method

TL;DR

Abstract

An Entropy-based Text Watermarking Detection Method

Authors

TL;DR

Abstract

Table of Contents

Figures (3)