Table of Contents
Fetching ...

Optimizing Token Choice for Code Watermarking: An RL Approach

Zhimeng Guo, Huaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Minhao Cheng

TL;DR

CodeTracer introduces an adaptive, policy-driven watermarking framework for LLM-generated code that embeds detectable statistical watermarks during generation without compromising functionality. It couples a trainable watermark policy with a frozen base LLM, trained via GRPO to optimize both execution correctness and watermark detectability using a dual reward structure and differentiable discrete-token decisions (Straight-Through Estimation and Gumbel-Top-k). The approach demonstrates superior watermark detectability (AUROC) and maintained code quality (Pass@1) across Python and cross-language benchmarks, with limited computational overhead and strong robustness to attacks and model transfer. Practically, CodeTracer enables plug-in watermarking for diverse code-generation models, offering scalable IP protection and attribution in real-world AI code production. The framework advances watermarking by integrating syntactic awareness, verifiable rewards, and efficient learning to operate within the constraints of structured programming languages.

Abstract

Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ Gumbel Top-k reparameterization to enable gradient-based optimization of discrete watermarking decisions. Extensive comparative evaluations demonstrate CodeTracer's significant superiority over state-of-the-art baselines in both watermark detectability and the preservation of generated code's functionality.

Optimizing Token Choice for Code Watermarking: An RL Approach

TL;DR

CodeTracer introduces an adaptive, policy-driven watermarking framework for LLM-generated code that embeds detectable statistical watermarks during generation without compromising functionality. It couples a trainable watermark policy with a frozen base LLM, trained via GRPO to optimize both execution correctness and watermark detectability using a dual reward structure and differentiable discrete-token decisions (Straight-Through Estimation and Gumbel-Top-k). The approach demonstrates superior watermark detectability (AUROC) and maintained code quality (Pass@1) across Python and cross-language benchmarks, with limited computational overhead and strong robustness to attacks and model transfer. Practically, CodeTracer enables plug-in watermarking for diverse code-generation models, offering scalable IP protection and attribution in real-world AI code production. The framework advances watermarking by integrating syntactic awareness, verifiable rewards, and efficient learning to operate within the constraints of structured programming languages.

Abstract

Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ Gumbel Top-k reparameterization to enable gradient-based optimization of discrete watermarking decisions. Extensive comparative evaluations demonstrate CodeTracer's significant superiority over state-of-the-art baselines in both watermark detectability and the preservation of generated code's functionality.

Paper Structure

This paper contains 39 sections, 14 equations, 4 figures, 6 tables, 2 algorithms.

Figures (4)

  • Figure 1: CodeTracer: A framework for LLM code watermarking through selective token biasing. The diagram shows our end-to-end pipeline where a trainable watermark model collaborates with an LLM to embed detectable statistical patterns in generated code. A reward system optimizes the dual objectives of preserving code functionality while maximizing watermark detectability. The watermark model operates as a plug-in module, enabling deployment beyond those used during training. Importantly, watermark detection requires only the watermark model, not the original LLM.
  • Figure 2: Training on 1.5B LLM and evaluating on 8B OpenCoder-8B-Instruct LLM.
  • Figure 3: Cross-language evaluation on Java and C++. CodeTracer achieves consistent cross-language performance.
  • Figure 4: Training dynamics for CodeTracer-1 with RL training.