Watermark-based Attribution of AI-Generated Content

Zhengyuan Jiang; Moyang Guo; Yuepeng Hu; Yupu Wang; Neil Zhenqiang Gong

Watermark-based Attribution of AI-Generated Content

Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Yupu Wang, Neil Zhenqiang Gong

TL;DR

This work addresses the challenge of attributing AI-generated content to individual GenAI users by embedding a unique watermark per user and detecting content via bitwise similarity $BA$ to the user marks with threshold $ au>0.5$. The authors develop a two-step framework: (1) a theoretical analysis that derives lower bounds for the true detection rate $TDR_i$ and true attribution rate $TAR_i$, and an upper bound for the false detection rate $FDR$ for any given watermark set, and (2) a watermark selection method that minimizes maximum pairwise watermark similarity using an approximate farthest-string approach (A-BSTA) to maximize these bounds. They validate the approach on AI-generated images from Stable Diffusion, Midjourney, and DALL-E 2 using three watermarking methods (HiDDeN, StegaStamp, PRC), showing high $TDR$ and $TAR$ with near-zero $FDR$ under non-adversarial post-processing and robustness to common post-processing, with clear degradation under adversarial post-processing. The work also provides a practical, scalable pipeline for per-user watermark assignment and discusses privacy, forgery risks, and extensions to other modalities. Overall, watermark-based attribution emerges as a feasible, statistically-grounded mechanism for linking AI-generated content back to specific users, enabling accountability in real-world GenAI ecosystems.

Abstract

Several companies have deployed watermark-based detection to identify AI-generated content. However, attribution--the ability to trace back to the user of a generative AI (GenAI) service who created a given AI-generated content--remains largely unexplored despite its growing importance. In this work, we aim to bridge this gap by conducting the first systematic study on watermark-based, user-level attribution of AI-generated content. Our key idea is to assign a unique watermark to each user of the GenAI service and embed this watermark into the AI-generated content created by that user. Attribution is then performed by identifying the user whose watermark best matches the one extracted from the given content. This approach, however, faces a key challenge: How should watermarks be selected for users to maximize attribution performance? To address the challenge, we first theoretically derive lower bounds on detection and attribution performance through rigorous probabilistic analysis for any given set of user watermarks. Then, we select watermarks for users to maximize these lower bounds, thereby optimizing detection and attribution performance. Our theoretical and empirical results show that watermark-based attribution inherits both the accuracy and (non-)robustness properties of the underlying watermark. Specifically, attribution remains highly accurate when the watermarked AI-generated content is either not post-processed or subjected to common post-processing such as JPEG compression, as well as black-box adversarial post-processing with limited query budgets.

Watermark-based Attribution of AI-Generated Content

TL;DR

This work addresses the challenge of attributing AI-generated content to individual GenAI users by embedding a unique watermark per user and detecting content via bitwise similarity

to the user marks with threshold

. The authors develop a two-step framework: (1) a theoretical analysis that derives lower bounds for the true detection rate

and true attribution rate

, and an upper bound for the false detection rate

for any given watermark set, and (2) a watermark selection method that minimizes maximum pairwise watermark similarity using an approximate farthest-string approach (A-BSTA) to maximize these bounds. They validate the approach on AI-generated images from Stable Diffusion, Midjourney, and DALL-E 2 using three watermarking methods (HiDDeN, StegaStamp, PRC), showing high

and

with near-zero

under non-adversarial post-processing and robustness to common post-processing, with clear degradation under adversarial post-processing. The work also provides a practical, scalable pipeline for per-user watermark assignment and discusses privacy, forgery risks, and extensions to other modalities. Overall, watermark-based attribution emerges as a feasible, statistically-grounded mechanism for linking AI-generated content back to specific users, enabling accountability in real-world GenAI ecosystems.

Abstract

Paper Structure (33 sections, 6 theorems, 22 equations, 15 figures, 7 tables, 3 algorithms)

This paper contains 33 sections, 6 theorems, 22 equations, 15 figures, 7 tables, 3 algorithms.

Introduction
Related Work
Problem Formulation
Watermark-based Attribution
Detection and Attribution Performance
Evaluation Metrics
Formal Quantification of Watermarking
Detection and Attribution Performance
Selecting Watermarks for Users
Formulating a Watermark Selection Problem
Solving the Watermark Selection Problem
Experiments
Detection and Attribution Results in Different Scenarios
Comparing Watermark Selection Methods
Theoretical vs. Empirical Results
...and 18 more sections

Key Result

Theorem 1

Suppose we are given $s$ users with any $s$ watermarks $W=\{w_1, w_2, \cdots, w_s\}$. When the watermarking method is $\beta_i$-accurate for user $U_i$'s AI-generated content, we have a lower bound of TDR$_i$: where $0.5 \textless \tau \textless \beta_i$, $\underline{\alpha_i}=\min_{j \in \{1,2,\cdots,s\}/\{i\}} BA(w_i, w_j)$, and $n_i\sim B(n, \beta_i)$ (binomial distribution).

Figures (15)

Figure 1: Registration, generation, and detection & attribution phases.
Figure 2: Ranked TARs of the 100,000 users.
Figure 3: (a) Ranked $TAR_i$ of the worst 1K users for the three selection methods. (b) Theoretical vs. empirical results.
Figure 4: Taxonomy of detection and attribution results. Nodes with red color indicate incorrect detection/attribution.
Figure 5: User-agnostic vs. user-aware results. $w_1$, $w_2$, $w_3$, and $w_{4}$ are 4 different random watermarks for the user-agnostic setting, where $s$=100,000 users for the user-aware setting. Results show that the two settings achieve comparable TDR and FDR.
...and 10 more figures

Theorems & Definitions (10)

Definition 1: Detection of AI-generated content
Definition 2: Attribution of AI-generated content
Theorem 1: Lower bound of TDR$_i$
Corollary 1
Theorem 2: Upper bound of FDR
Theorem 3: Alternative upper bound of FDR
Corollary 2
Theorem 4: Lower bound of TAR$_i$
Definition 3: $\beta$-accurate watermarking
Definition 4: $\gamma$-random watermarking

Watermark-based Attribution of AI-Generated Content

TL;DR

Abstract

Watermark-based Attribution of AI-Generated Content

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (10)