Table of Contents
Fetching ...

Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code

Jungin Kim, Shinwoo Park, Yo-Sub Han

TL;DR

STONE is presented, a syntax-aware watermarking method that embeds watermarks only in non-syntactic tokens and preserves code integrity and introduces STEM, a comprehensive framework that balances three critical dimensions: correctness, detectability, and imperceptibility.

Abstract

Identifying LLM-generated code through watermarking poses a challenge in preserving functional correctness. Previous methods rely on the assumption that watermarking high-entropy tokens effectively maintains output quality. Our analysis reveals a fundamental limitation of this assumption: syntax-critical tokens such as keywords often exhibit the highest entropy, making existing approaches vulnerable to logic corruption. We present STONE, a syntax-aware watermarking method that embeds watermarks only in non-syntactic tokens and preserves code integrity. For its rigorous assessment, we also introduce STEM, a comprehensive framework that balances three critical dimensions: correctness, detectability, and imperceptibility. Across Python, C++, and Java, STONE preserves correctness, sustains strong detectability, and achieves balanced performance with minimal overhead. Our implementation is available at https://anonymous.4open.science/r/STONE-watermarking-AB4B/.

Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code

TL;DR

STONE is presented, a syntax-aware watermarking method that embeds watermarks only in non-syntactic tokens and preserves code integrity and introduces STEM, a comprehensive framework that balances three critical dimensions: correctness, detectability, and imperceptibility.

Abstract

Identifying LLM-generated code through watermarking poses a challenge in preserving functional correctness. Previous methods rely on the assumption that watermarking high-entropy tokens effectively maintains output quality. Our analysis reveals a fundamental limitation of this assumption: syntax-critical tokens such as keywords often exhibit the highest entropy, making existing approaches vulnerable to logic corruption. We present STONE, a syntax-aware watermarking method that embeds watermarks only in non-syntactic tokens and preserves code integrity. For its rigorous assessment, we also introduce STEM, a comprehensive framework that balances three critical dimensions: correctness, detectability, and imperceptibility. Across Python, C++, and Java, STONE preserves correctness, sustains strong detectability, and achieves balanced performance with minimal overhead. Our implementation is available at https://anonymous.4open.science/r/STONE-watermarking-AB4B/.

Paper Structure

This paper contains 31 sections, 7 equations, 4 figures, 13 tables, 2 algorithms.

Figures (4)

  • Figure 1: Motivating example of STONE watermarking. Existing methods modify syntax tokens and cause syntax errors, whereas STONE preserves syntax and embeds watermarks without breaking code structure.
  • Figure 2: Entropy values by token type. Tokens that do not fall under keywords, whitespace, types, delimiters, or operators are categorized as etc tokens. The etc category is the primary target for our proposed STONE watermarking method. Token entropy is measured using Qwen2.5-Coder-7B.
  • Figure 3: The trade-off between pass@1 (Y-axis) and detectability (X-axis). We analyze the impact of the green list ratio $\gamma$ (indicated by marker color) and the watermark strength $\delta$ (indicated by marker size). Circular and triangular markers represent STONE and SWEET, respectively.
  • Figure 4: We evaluate the robustness of watermarking methods by applying two types of adversarial perturbations—code refactoring and paraphrasing—to watermarked code. Following these attacks, we assess the effectiveness of watermark detection. We compare the performance of watermarks embedded using SWEET and STONE to determine their resilience under both attack scenarios.