Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation
Aditya Desu, Xuanli He, Qiongkai Xu, Wei Lu
TL;DR
Protecting IP and tracing authorship for AI-generated content is increasingly critical in MLaaS contexts. The authors propose Generative Models are Self-Watermarked, a black-box verification framework that uses iterative re-generation to reveal latent fingerprints and a distance metric $D$ to compare original samples to their re-generated versions. The approach is grounded in fixed-point theory, showing that under a contraction with Lipschitz constant $L \in (0,1)$ the distances converge to a fixed point, enabling robust ownership verification with a threshold $\delta$. Experiments across natural language and image generation demonstrate high precision/recall, robustness to perturbations, and generalization across tasks and datasets, all without modifying model parameters or post-processing outputs.
Abstract
As machine- and AI-generated content proliferates, protecting the intellectual property of generative models has become imperative, yet verifying data ownership poses formidable challenges, particularly in cases of unauthorized reuse of generated data. The challenge of verifying data ownership is further amplified by using Machine Learning as a Service (MLaaS), which often functions as a black-box system. Our work is dedicated to detecting data reuse from even an individual sample. Traditionally, watermarking has been leveraged to detect AI-generated content. However, unlike watermarking techniques that embed additional information as triggers into models or generated content, potentially compromising output quality, our approach identifies latent fingerprints inherently present within the outputs through re-generation. We propose an explainable verification procedure that attributes data ownership through re-generation, and further amplifies these fingerprints in the generative models through iterative data re-generation. This methodology is theoretically grounded and demonstrates viability and robustness using recent advanced text and image generative models. Our methodology is significant as it goes beyond protecting the intellectual property of APIs and addresses important issues such as the spread of misinformation and academic misconduct. It provides a useful tool to ensure the integrity of sources and authorship, expanding its application in different scenarios where authenticity and ownership verification are essential.
