Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation

Aditya Desu; Xuanli He; Qiongkai Xu; Wei Lu

Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation

Aditya Desu, Xuanli He, Qiongkai Xu, Wei Lu

TL;DR

Protecting IP and tracing authorship for AI-generated content is increasingly critical in MLaaS contexts. The authors propose Generative Models are Self-Watermarked, a black-box verification framework that uses iterative re-generation to reveal latent fingerprints and a distance metric $D$ to compare original samples to their re-generated versions. The approach is grounded in fixed-point theory, showing that under a contraction with Lipschitz constant $L \in (0,1)$ the distances converge to a fixed point, enabling robust ownership verification with a threshold $\delta$. Experiments across natural language and image generation demonstrate high precision/recall, robustness to perturbations, and generalization across tasks and datasets, all without modifying model parameters or post-processing outputs.

Abstract

As machine- and AI-generated content proliferates, protecting the intellectual property of generative models has become imperative, yet verifying data ownership poses formidable challenges, particularly in cases of unauthorized reuse of generated data. The challenge of verifying data ownership is further amplified by using Machine Learning as a Service (MLaaS), which often functions as a black-box system. Our work is dedicated to detecting data reuse from even an individual sample. Traditionally, watermarking has been leveraged to detect AI-generated content. However, unlike watermarking techniques that embed additional information as triggers into models or generated content, potentially compromising output quality, our approach identifies latent fingerprints inherently present within the outputs through re-generation. We propose an explainable verification procedure that attributes data ownership through re-generation, and further amplifies these fingerprints in the generative models through iterative data re-generation. This methodology is theoretically grounded and demonstrates viability and robustness using recent advanced text and image generative models. Our methodology is significant as it goes beyond protecting the intellectual property of APIs and addresses important issues such as the spread of misinformation and academic misconduct. It provides a useful tool to ensure the integrity of sources and authorship, expanding its application in different scenarios where authenticity and ownership verification are essential.

Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation

TL;DR

to compare original samples to their re-generated versions. The approach is grounded in fixed-point theory, showing that under a contraction with Lipschitz constant

the distances converge to a fixed point, enabling robust ownership verification with a threshold

. Experiments across natural language and image generation demonstrate high precision/recall, robustness to perturbations, and generalization across tasks and datasets, all without modifying model parameters or post-processing outputs.

Abstract

Paper Structure (53 sections, 2 theorems, 15 equations, 26 figures, 17 tables, 2 algorithms)

This paper contains 53 sections, 2 theorems, 15 equations, 26 figures, 17 tables, 2 algorithms.

Introduction
Stage I: Generation
Stage II: Verification
Related Work
Authorship Identification for Image Generation Models
Authorship Identification for Natural Language Generation Models
Background
Prompt-based Generation:
Paraphrasing Content:
Methodology
Data Generation and Verification Protocol
Authorship Verification through Contrastive Re-Generation
Enhancing Fingerprint through Iterative Re-Generation
Experiments
Experimental Setup
...and 38 more sections

Key Result

Theorem 1

The distance between the input and output of the $k$-th iteration is bounded by and the distance converges to 0 given $L \in (0,1)$.We estimated $L$ for Stable Diffusion models with results presented in Appendix app:lipschitz_empirical_est.

Figures (26)

Figure 1: The two-stage framework leveraging fingerprints in generative models. In (I)Generation Stage, models generate output in traditional ways and optionally re-generate the output $k$ times prior to release. In (II)Verification Stage, the authentication of data ownership is established by assessing the distance between the data and its re-generated version. This is illustrated using authentic generator ($\mathcal{G}_a$) and contrasting generator ($\mathcal{G}_c$), exemplified by models from OpenAI and Stability AI respectively.
Figure 2: The convergence analysis of the distances in iterations based on various metrics on the re-generated text of 200 samples from in-house datasets.
Figure 3: Density distribution of one-step re-generation among four text generation models, where the input to the one-step re-generation is the $k$-th iteration from the authentic models. The authentic models from left to right are: 1) M2M, 2) mBART, 3) GPT3.5-turbo, and 4) Cohere.
Figure 4: The convergence analysis of the distances in iterations based on CLIPS and LPIPS metrics on re-generated images of 200 samples on Coco and Polo datasets.
Figure 5: Verifying data generated by authentic $\mathcal{G}_a$ SDv2.1 and SDv 2.1 Base on Coco Dataset at various iterations using CLIP distance.
...and 21 more figures

Theorems & Definitions (4)

Definition 1: The Fixed Points of a Lipschitz Continuous Function
Theorem 1: The Convergence of Step Distance for $k$-th Re-generation
proof
Theorem 2: Banach Fixed-Point Theorem

Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation

TL;DR

Abstract

Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation

TL;DR

Abstract

Table of Contents

Key Result

Figures (26)

Theorems & Definitions (4)