Table of Contents
Fetching ...

Who Wrote this Code? Watermarking for Code Generation

Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, Gunhee Kim

TL;DR

The paper addresses the challenge of detecting machine-generated code without severely degrading code quality by improving watermarking methods for code generation. It introduces SWEET, a selective watermarking approach that applies green-red token signaling only to high-entropy tokens, improving detectability (AUROC) while preserving functional code quality relative to prior methods like WLLM. Across HumanEval, MBPP, DS-1000, and multiple languages, SWEET outperforms baselines and demonstrates robustness to prompts, surrogate detectors, and certain paraphrasing attacks, though it notes limitations in paraphrase resistance and the need for entropy-threshold calibration. The work provides practical guidance for deploying code watermarking in real-world settings and offers theoretical and empirical support for selective entropy-based watermarking in low-entropy code generation tasks.

Abstract

Since the remarkable generation performance of large language models raised ethical and legal concerns, approaches to detect machine-generated text by embedding watermarks are being developed. However, we discover that the existing works fail to function appropriately in code generation tasks due to the task's nature of having low entropy. Extending a logit-modifying watermark method, we propose Selective WatErmarking via Entropy Thresholding (SWEET), which enhances detection ability and mitigates code quality degeneration by removing low-entropy segments at generating and detecting watermarks. Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines, including post-hoc detection methods, in detecting machine-generated code text. Our code is available in https://github.com/hongcheki/sweet-watermark.

Who Wrote this Code? Watermarking for Code Generation

TL;DR

The paper addresses the challenge of detecting machine-generated code without severely degrading code quality by improving watermarking methods for code generation. It introduces SWEET, a selective watermarking approach that applies green-red token signaling only to high-entropy tokens, improving detectability (AUROC) while preserving functional code quality relative to prior methods like WLLM. Across HumanEval, MBPP, DS-1000, and multiple languages, SWEET outperforms baselines and demonstrates robustness to prompts, surrogate detectors, and certain paraphrasing attacks, though it notes limitations in paraphrase resistance and the need for entropy-threshold calibration. The work provides practical guidance for deploying code watermarking in real-world settings and offers theoretical and empirical support for selective entropy-based watermarking in low-entropy code generation tasks.

Abstract

Since the remarkable generation performance of large language models raised ethical and legal concerns, approaches to detect machine-generated text by embedding watermarks are being developed. However, we discover that the existing works fail to function appropriately in code generation tasks due to the task's nature of having low entropy. Extending a logit-modifying watermark method, we propose Selective WatErmarking via Entropy Thresholding (SWEET), which enhances detection ability and mitigates code quality degeneration by removing low-entropy segments at generating and detecting watermarks. Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines, including post-hoc detection methods, in detecting machine-generated code text. Our code is available in https://github.com/hongcheki/sweet-watermark.
Paper Structure (39 sections, 2 theorems, 14 equations, 14 figures, 3 tables, 2 algorithms)

This paper contains 39 sections, 2 theorems, 14 equations, 14 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Consider a token sequence $\bm{y}=\{y_0,\dots,y_{N-1}\}$ generated by a watermarked code LLM. $(S_0,\dots,S_{N-1})$ is a sequence of corresponding spike entropy, in which the modulus is $\frac{(1-\gamma)(e^\delta-1)}{1+(e^\delta-1)\gamma}$. Let $\tau$ be an entropy threshold, $N^{l}$ and $N^{h}$ be then there is a lower bound of $z$-score that is always higher when the entropy threshold is applie

Figures (14)

  • Figure 1: Illustrated comparison of $\textsc{WLLM}$Kirchenbauer2023watermark and $\textsc{SWEET}$ (ours). Note that this example is a short hypothetical explanatory example. LLMs can generate working source code (a) without a watermark. Strong watermark (b) or weak watermark (c) may result in detection or correctness failure, but (d) selective watermarking may avoid both failures.
  • Figure 2: A real example of HumanEval/4 for comparing between (a) WLLM and (b)--(d) our SWEET with different thresholds. Text colors annotate whether tokens are in the green or red list. Gray tokens have entropy smaller than the threshold and are not watermarked. The intensity of the yellow background color visualizes the entropy value. (a) While WLLM produces an incorrect code and less detectable watermarks with a few green tokens (low z-score), (b)-(d) SWEET improves both code quality and z-score by selectively embedding and detecting watermarks using an entropy threshold. Interestingly, (c) the z-score peaks with a moderate threshold, and (d) as the threshold increases, the z-score declines due to the decrease in the watermarking ratio.
  • Figure 3: The tradeoff between AUROC and pass@1 of detecting real and generated samples of HumanEval, MBPP, and DS-1000 datasets. The pink line represents a Pareto frontier of $\textsc{SWEET}$, while the blue line represents that of $\textsc{WLLM}$. $\textsc{SWEET}$ shows consistent dominance. The red/orange line and circles are the points used in Table \ref{['tab:table_main']}. The entropy threshold for $\textsc{SWEET}$ is 1.2 here, and Pareto frontier figures for all threshold values are in Figure \ref{['fig:pareto_frontier_appendix']}.
  • Figure 4: Plots of code quality pass@1 and detection AUROC when calibrating the entropy threshold of our methods, $\textsc{SWEET}$, on the three code benchmarks. We set $\gamma=0.25$ and $\delta=3.0$. While code generation performance increases with a higher entropy threshold, detection AUROC scores make an up-and-down curve.
  • Figure 5: Watermark detection performance on renamed variables in the code. We set $\gamma=0.25$ and $\delta=3.0$ for $\textsc{WLLM}$ and $\textsc{SWEET}$. For $\textsc{EXP-edit}$, we search the hyperparameter for the block size in [20,30,40] with a high entropy setting.
  • ...and 9 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Lemma C.1
  • proof