Watermark-based Attribution of AI-Generated Content
Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Yupu Wang, Neil Zhenqiang Gong
TL;DR
This work addresses the challenge of attributing AI-generated content to individual GenAI users by embedding a unique watermark per user and detecting content via bitwise similarity $BA$ to the user marks with threshold $ au>0.5$. The authors develop a two-step framework: (1) a theoretical analysis that derives lower bounds for the true detection rate $TDR_i$ and true attribution rate $TAR_i$, and an upper bound for the false detection rate $FDR$ for any given watermark set, and (2) a watermark selection method that minimizes maximum pairwise watermark similarity using an approximate farthest-string approach (A-BSTA) to maximize these bounds. They validate the approach on AI-generated images from Stable Diffusion, Midjourney, and DALL-E 2 using three watermarking methods (HiDDeN, StegaStamp, PRC), showing high $TDR$ and $TAR$ with near-zero $FDR$ under non-adversarial post-processing and robustness to common post-processing, with clear degradation under adversarial post-processing. The work also provides a practical, scalable pipeline for per-user watermark assignment and discusses privacy, forgery risks, and extensions to other modalities. Overall, watermark-based attribution emerges as a feasible, statistically-grounded mechanism for linking AI-generated content back to specific users, enabling accountability in real-world GenAI ecosystems.
Abstract
Several companies have deployed watermark-based detection to identify AI-generated content. However, attribution--the ability to trace back to the user of a generative AI (GenAI) service who created a given AI-generated content--remains largely unexplored despite its growing importance. In this work, we aim to bridge this gap by conducting the first systematic study on watermark-based, user-level attribution of AI-generated content. Our key idea is to assign a unique watermark to each user of the GenAI service and embed this watermark into the AI-generated content created by that user. Attribution is then performed by identifying the user whose watermark best matches the one extracted from the given content. This approach, however, faces a key challenge: How should watermarks be selected for users to maximize attribution performance? To address the challenge, we first theoretically derive lower bounds on detection and attribution performance through rigorous probabilistic analysis for any given set of user watermarks. Then, we select watermarks for users to maximize these lower bounds, thereby optimizing detection and attribution performance. Our theoretical and empirical results show that watermark-based attribution inherits both the accuracy and (non-)robustness properties of the underlying watermark. Specifically, attribution remains highly accurate when the watermarked AI-generated content is either not post-processed or subjected to common post-processing such as JPEG compression, as well as black-box adversarial post-processing with limited query budgets.
