Table of Contents
Fetching ...

PVMark: Enabling Public Verifiability for LLM Watermarking Schemes

Haohua Duan, Liyao Xiang, Xin Zhang

TL;DR

PVMark introduces public verifiability to LLM watermarking by integrating zero-knowledge proofs into watermark detection, allowing third parties to verify correctness without exposing secret keys. It adapts KGW, SynthID-Text, and Segment-Watermark into ZKP-friendly forms using hash-based vocabulary partitioning and cryptographic hashes as PRFs, complemented by PLONKish circuits and Merkle proofs. The approach is extended with recursive ZKP (Nova folding) to improve efficiency, and comprehensive evaluations show minimal impact on watermark properties while achieving practical verification costs on BN254-based circuits. This work enables credible, auditable ownership verification and provenance tracing for AI-generated content with realistic deployment potential in watermarking services.

Abstract

Watermarking schemes for large language models (LLMs) have been proposed to identify the source of the generated text, mitigating the potential threats emerged from model theft. However, current watermarking solutions hardly resolve the trust issue: the non-public watermark detection cannot prove itself faithfully conducting the detection. We observe that it is attributed to the secret key mostly used in the watermark detection -- it cannot be public, or the adversary may launch removal attacks provided the key; nor can it be private, or the watermarking detection is opaque to the public. To resolve the dilemma, we propose PVMark, a plugin based on zero-knowledge proof (ZKP), enabling the watermark detection process to be publicly verifiable by third parties without disclosing any secret key. PVMark hinges upon the proof of `correct execution' of watermark detection on which a set of ZKP constraints are built, including mapping, random number generation, comparison, and summation. We implement multiple variants of PVMark in Python, Rust and Circom, covering combinations of three watermarking schemes, three hash functions, and four ZKP protocols, to show our approach effectively works under a variety of circumstances. By experimental results, PVMark efficiently enables public verifiability on the state-of-the-art LLM watermarking schemes yet without compromising the watermarking performance, promising to be deployed in practice.

PVMark: Enabling Public Verifiability for LLM Watermarking Schemes

TL;DR

PVMark introduces public verifiability to LLM watermarking by integrating zero-knowledge proofs into watermark detection, allowing third parties to verify correctness without exposing secret keys. It adapts KGW, SynthID-Text, and Segment-Watermark into ZKP-friendly forms using hash-based vocabulary partitioning and cryptographic hashes as PRFs, complemented by PLONKish circuits and Merkle proofs. The approach is extended with recursive ZKP (Nova folding) to improve efficiency, and comprehensive evaluations show minimal impact on watermark properties while achieving practical verification costs on BN254-based circuits. This work enables credible, auditable ownership verification and provenance tracing for AI-generated content with realistic deployment potential in watermarking services.

Abstract

Watermarking schemes for large language models (LLMs) have been proposed to identify the source of the generated text, mitigating the potential threats emerged from model theft. However, current watermarking solutions hardly resolve the trust issue: the non-public watermark detection cannot prove itself faithfully conducting the detection. We observe that it is attributed to the secret key mostly used in the watermark detection -- it cannot be public, or the adversary may launch removal attacks provided the key; nor can it be private, or the watermarking detection is opaque to the public. To resolve the dilemma, we propose PVMark, a plugin based on zero-knowledge proof (ZKP), enabling the watermark detection process to be publicly verifiable by third parties without disclosing any secret key. PVMark hinges upon the proof of `correct execution' of watermark detection on which a set of ZKP constraints are built, including mapping, random number generation, comparison, and summation. We implement multiple variants of PVMark in Python, Rust and Circom, covering combinations of three watermarking schemes, three hash functions, and four ZKP protocols, to show our approach effectively works under a variety of circumstances. By experimental results, PVMark efficiently enables public verifiability on the state-of-the-art LLM watermarking schemes yet without compromising the watermarking performance, promising to be deployed in practice.

Paper Structure

This paper contains 21 sections, 13 equations, 4 figures, 7 tables, 3 algorithms.

Figures (4)

  • Figure 1: From top to bottom: ➀ The LLM owner detects unauthorized use of the LLM by watermark detection. ➁ The LLM owner sues the suspected model and the the court requests evidence. ➂ The LLM owner's secret key gets exposed in answering the court's request. ➌ The LLM owner provides the proof as evidence under PVMark thus preventing the attack.
  • Figure 2: ZKP costs for PVMark v.s. different numbers of tokens: the left two columns are for detection by KGW and the right two columns are for detection by SynthID-Text. $A,B,C$ denote MiMC, Poseidon, Poseidon2 variants, respectively and $1$ and $2$ represent the use of two-to-one hash and three-to-one hash.
  • Figure 3: ZKP costs for Nova version of PVMark v.s. varying $N_t, N_f$ ($N_t \times N_f = 200$ tokens): the upper row is for detection by KGW and the lower row is for detection by SynthID-Text. ST-1, ST-2 denote the setup time to prove the final instance and the folding process, respectively, and PT-1, PT-2 are their prove time correspondingly. The total time cost is ST-1 + ST-2 + PT-1 + PT-2.
  • Figure 4: ZKP costs for Nova version of PVMark v.s. different numbers of tokens: the setting of $N_t$ is such that the sum of setup time and prove time is minimized.

Theorems & Definitions (1)

  • Definition 4.1