Table of Contents
Fetching ...

PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints

Jiahao Huo, Shuliang Liu, Bin Wang, Junyan Zhang, Yibo Yan, Aiwei Liu, Xuming Hu, Mingxun Zhou

TL;DR

PMark is proposed, a simple yet powerful SWM method that estimates the PF median for the next sentence dynamically through sampling while enforcing multiple PF constraints (which the authors call channels) to strengthen watermark evidence.

Abstract

Semantic-level watermarking (SWM) for large language models (LLMs) enhances watermarking robustness against text modifications and paraphrasing attacks by treating the sentence as the fundamental unit. However, existing methods still lack strong theoretical guarantees of robustness, and reject-sampling-based generation often introduces significant distribution distortions compared with unwatermarked outputs. In this work, we introduce a new theoretical framework on SWM through the concept of proxy functions (PFs) $\unicode{x2013}$ functions that map sentences to scalar values. Building on this framework, we propose PMark, a simple yet powerful SWM method that estimates the PF median for the next sentence dynamically through sampling while enforcing multiple PF constraints (which we call channels) to strengthen watermark evidence. Equipped with solid theoretical guarantees, PMark achieves the desired distortion-free property and improves the robustness against paraphrasing-style attacks. We also provide an empirically optimized version that further removes the requirement for dynamical median estimation for better sampling efficiency. Experimental results show that PMark consistently outperforms existing SWM baselines in both text quality and robustness, offering a more effective paradigm for detecting machine-generated text. Our code will be released at [this URL](https://github.com/PMark-repo/PMark).

PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints

TL;DR

PMark is proposed, a simple yet powerful SWM method that estimates the PF median for the next sentence dynamically through sampling while enforcing multiple PF constraints (which the authors call channels) to strengthen watermark evidence.

Abstract

Semantic-level watermarking (SWM) for large language models (LLMs) enhances watermarking robustness against text modifications and paraphrasing attacks by treating the sentence as the fundamental unit. However, existing methods still lack strong theoretical guarantees of robustness, and reject-sampling-based generation often introduces significant distribution distortions compared with unwatermarked outputs. In this work, we introduce a new theoretical framework on SWM through the concept of proxy functions (PFs) functions that map sentences to scalar values. Building on this framework, we propose PMark, a simple yet powerful SWM method that estimates the PF median for the next sentence dynamically through sampling while enforcing multiple PF constraints (which we call channels) to strengthen watermark evidence. Equipped with solid theoretical guarantees, PMark achieves the desired distortion-free property and improves the robustness against paraphrasing-style attacks. We also provide an empirically optimized version that further removes the requirement for dynamical median estimation for better sampling efficiency. Experimental results show that PMark consistently outperforms existing SWM baselines in both text quality and robustness, offering a more effective paradigm for detecting machine-generated text. Our code will be released at [this URL](https://github.com/PMark-repo/PMark).

Paper Structure

This paper contains 66 sections, 9 theorems, 54 equations, 14 figures, 6 tables, 4 algorithms.

Key Result

Lemma 1

For any fixed green set $S\subseteq U$ and any $N\ge 1$, the distribution of the output sentence $Y$ is independent of $N$ and equals the natural distribution conditioned on $\mathcal{F}\in S$. Specifically, for any $s\in\Sigma^*$ with $u=\mathcal{F}(s)$,

Figures (14)

  • Figure 1: Illustration of PMark pipeline in 2D space, with robustness enhanced by multi-channel constraints. Note that we use orthogonal pivots and distortion-free partition in practice.
  • Figure 2: PMark Online Generation
  • Figure 3: Results of Mistral-7B on the C4 dataset. Smaller bubbles denote lower PPL.
  • Figure 4: TP@FP1% under Word-D and Word-S attacks.
  • Figure 5: Performance of Mistral-7B under different hyperparameter settings ($K=150, 250, +\infty$ and $\delta=0, 0.001$).
  • ...and 9 more figures

Theorems & Definitions (25)

  • Definition 1: Single-sentence Distortion-Free
  • Definition 2: Equivalence via Watermark Code Space
  • Definition 3: Probability Measure on $\Sigma^*$
  • Lemma 1: Probability Scaling of Green Region
  • Theorem 2: Closed-form of Watermarked PMF
  • Corollary 2.1
  • Theorem 3: Distortion-free on a Single Channel
  • proof
  • Theorem 4: Semantic Robustness on Single Channel
  • Remark 1: Multi-channel Distortion-free
  • ...and 15 more