CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models
Yu Zhang, Shuliang Liu, Xu Yang, Xuming Hu
TL;DR
CATMark tackles the challenge of watermarking LLM outputs across heterogeneous generation tasks by introducing a context-aware, threshold-adaptive framework. It clusters token generation contexts via logit distributions, computes per-context entropy thresholds from historical data, and applies watermark bias selectively to high-entropy tokens. The approach eliminates pre-set thresholds, preserves fidelity in low-entropy content like code, and achieves robust detection as shown by strong AUROC and pass@k across code, math, and StackEval benchmarks. It also withstands rewriting attacks like back-translation and paraphrasing, with acceptable computational overhead, making it practical for real-world use.
Abstract
Watermarking algorithms for Large Language Models (LLMs) effectively identify machine-generated content by embedding and detecting hidden statistical features in text. However, such embedding leads to a decline in text quality, especially in low-entropy scenarios where performance needs improvement. Existing methods that rely on entropy thresholds often require significant computational resources for tuning and demonstrate poor adaptability to unknown or cross-task generation scenarios. We propose \textbf{C}ontext-\textbf{A}ware \textbf{T}hreshold watermarking ($\myalgo$), a novel framework that dynamically adjusts watermarking intensity based on real-time semantic context. $\myalgo$ partitions text generation into semantic states using logits clustering, establishing context-aware entropy thresholds that preserve fidelity in structured content while embedding robust watermarks. Crucially, it requires no pre-defined thresholds or task-specific tuning. Experiments show $\myalgo$ improves text quality in cross-tasks without sacrificing detection accuracy.
