Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Zhuoshang Wang; Yubing Ren; Yanan Cao; Fang Fang; Xiaoxue Li; Li Guo

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Zhuoshang Wang, Yubing Ren, Yanan Cao, Fang Fang, Xiaoxue Li, Li Guo

Abstract

While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Abstract

Paper Structure (66 sections, 21 equations, 10 figures, 4 tables)

This paper contains 66 sections, 21 equations, 10 figures, 4 tables.

Introduction
Related Work
Private-Key Based LLM Watermarking
Publicly Detectable Watermarking
Preliminary
LLM Generation
LLM Watermarking
Logits-based Watermarking.
Sampling-based Watermarking.
TTP-Detect: A Black-Box Third-Party Detection Framework
Threat Model and Problem Formulation
Threat Model.
Problem Formulation.
TTP-Detect Framework Overview.
Proxy-Based Representation Extraction
...and 51 more sections

Figures (10)

Figure 1: TTP-Detect framework overview.
Figure 2: Robustness across various attack types.
Figure 3: Ablation study for (a) proxy model and (b) relative measurement modules.
Figure 4: (a) Impact of the size of reference set; (b) hyperparameter analysis of $k$ in Local Consistency Test.
Figure 5: Instruction template used to construct the LoRA fine-tuning data for representation extraction models.
...and 5 more figures

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Abstract

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Authors

Abstract

Table of Contents

Figures (10)