Table of Contents
Fetching ...

Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training?

Yuechen Xie, Jie Song, Huiqiong Wang, Mingli Song

TL;DR

This work tackles the problem of verifying whether a black-box suspicious model was trained on synthetic data from a defender's text-to-image model, focusing on Case 3 where the task may differ. It introduces TrainProVe, a three-stage approach that generates a shadow dataset from the defender’s generative model, trains a shadow model on it, and uses a one-sided Grubbs test to compare the suspect model’s performance on a validation set. The method is theoretically grounded in the generalization error bound, showing that closer source-target data distributions yield more similar generalization, enabling provenance inference. Empirically, TrainProVe achieves over 99% verification accuracy across multiple datasets and diffusion-model baselines, while remaining robust to architectural and hyperparameter variations and offering efficiency advantages over prior ideas. Overall, the approach provides a practical, scalable tool for intellectual property protection of open-source generative models in black-box scenarios.

Abstract

High-quality open-source text-to-image models have lowered the threshold for obtaining photorealistic images significantly, but also face potential risks of misuse. Specifically, suspects may use synthetic data generated by these generative models to train models for specific tasks without permission, when lacking real data resources especially. Protecting these generative models is crucial for the well-being of their owners. In this work, we propose the first method to this important yet unresolved issue, called Training data Provenance Verification (TrainProVe). The rationale behind TrainProVe is grounded in the principle of generalization error bound, which suggests that, for two models with the same task, if the distance between their training data distributions is smaller, their generalization ability will be closer. We validate the efficacy of TrainProVe across four text-to-image models (Stable Diffusion v1.4, latent consistency model, PixArt-$α$, and Stable Cascade). The results show that TrainProVe achieves a verification accuracy of over 99\% in determining the provenance of suspicious model training data, surpassing all previous methods. Code is available at https://github.com/xieyc99/TrainProVe.

Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training?

TL;DR

This work tackles the problem of verifying whether a black-box suspicious model was trained on synthetic data from a defender's text-to-image model, focusing on Case 3 where the task may differ. It introduces TrainProVe, a three-stage approach that generates a shadow dataset from the defender’s generative model, trains a shadow model on it, and uses a one-sided Grubbs test to compare the suspect model’s performance on a validation set. The method is theoretically grounded in the generalization error bound, showing that closer source-target data distributions yield more similar generalization, enabling provenance inference. Empirically, TrainProVe achieves over 99% verification accuracy across multiple datasets and diffusion-model baselines, while remaining robust to architectural and hyperparameter variations and offering efficiency advantages over prior ideas. Overall, the approach provides a practical, scalable tool for intellectual property protection of open-source generative models in black-box scenarios.

Abstract

High-quality open-source text-to-image models have lowered the threshold for obtaining photorealistic images significantly, but also face potential risks of misuse. Specifically, suspects may use synthetic data generated by these generative models to train models for specific tasks without permission, when lacking real data resources especially. Protecting these generative models is crucial for the well-being of their owners. In this work, we propose the first method to this important yet unresolved issue, called Training data Provenance Verification (TrainProVe). The rationale behind TrainProVe is grounded in the principle of generalization error bound, which suggests that, for two models with the same task, if the distance between their training data distributions is smaller, their generalization ability will be closer. We validate the efficacy of TrainProVe across four text-to-image models (Stable Diffusion v1.4, latent consistency model, PixArt-, and Stable Cascade). The results show that TrainProVe achieves a verification accuracy of over 99\% in determining the provenance of suspicious model training data, surpassing all previous methods. Code is available at https://github.com/xieyc99/TrainProVe.

Paper Structure

This paper contains 45 sections, 2 theorems, 14 equations, 5 figures, 22 tables, 1 algorithm.

Key Result

Theorem 1

Assume $M$ is trained on synthetic data generated by a text-to-image model $G$ with a set of text prompts $\mathcal{T}_1$ , i.e., $P_1(\bm{x}) = P(\bm{x} | G, \mathcal{T}_1)$. $\hat{M}$ can be trained on either real data or synthetic data generated by any text-to-image model. Based on the generaliza where $\Delta\epsilon_T$ represents the difference in generalization error between $M$ and $\hat{M}

Figures (5)

  • Figure 1: The overview of the three cases. These three cases encompass nearly every conceivable real-world scenario, where a suspect uses synthetic images from the defender's text-to-image model illegally. In this paper, we are committed to addressing the security risks in Case 3.
  • Figure 2: The overview of TrainProVe's motivation. $M$ is trained on the dataset whose data distribution is $P(x|G,\mathcal{T}_1)$. $\hat{M}_{1}$, $\hat{M}_{2}$, and $\hat{M}_{3}$ are $\hat{M}$ trained on three datasets with different data distributions, respectively. The source and target domain data are sampled from their respective data distributions. $G$ and $G'$ are two different text-to-image models, and $\mathcal{T}_1$, $\mathcal{T}_2$, and $\mathcal{T}_t$ are distinct sets of text prompts.
  • Figure 3: The complete process of TrainProVe.
  • Figure 4: Changing $M_{sdw}$'s training epochs. "Avg. Acc", "Avg. F1", and "Avg. AUC" are the average values of accuracy, F1 score, and AUROC under four different $G_d$.
  • Figure 5: Changing the sample sizes of $\mathcal{D}_{sdw}$ and $\mathcal{D}_{v}$. "Avg. Acc", "Avg. F1", and "Avg. AUC" are the average values of accuracy, F1 score, and AUROC under four different $G_d$.

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 1
  • proof