Table of Contents
Fetching ...

Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine?

Likun Zhang, Hao Wu, Lingcui Zhang, Fengyuan Xu, Jin Cao, Fenghua Li, Ben Niu

TL;DR

An injection-free training data attribution method that can identify whether a model's training data stems from a certain source model without adding additional watermarks on the source model, and a statistical-level attribution method, utilizing the shadow model technique to train an attribution discriminator.

Abstract

The emergence of text-to-image models has recently sparked significant interest, but the attendant is a looming shadow of potential infringement by violating the user terms. Specifically, an adversary may exploit data created by a commercial model to train their own without proper authorization. To address such risk, it is crucial to investigate the attribution of a suspicious model's training data by determining whether its training data originates, wholly or partially, from a specific source model. To trace the generated data, existing methods require applying extra watermarks during either the training or inference phases of the source model. However, these methods are impractical for pre-trained models that have been released, especially when model owners lack security expertise. To tackle this challenge, we propose an injection-free training data attribution method for text-to-image models. It can identify whether a suspicious model's training data stems from a source model, without additional modifications on the source model. The crux of our method lies in the inherent memorization characteristic of text-to-image models. Our core insight is that the memorization of the training dataset is passed down through the data generated by the source model to the model trained on that data, making the source model and the infringing model exhibit consistent behaviors on specific samples. Therefore, our approach involves developing algorithms to uncover these distinct samples and using them as inherent watermarks to verify if a suspicious model originates from the source model. Our experiments demonstrate that our method achieves an accuracy of over 80\% in identifying the source of a suspicious model's training data, without interfering the original training or generation process of the source model.

Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine?

TL;DR

An injection-free training data attribution method that can identify whether a model's training data stems from a certain source model without adding additional watermarks on the source model, and a statistical-level attribution method, utilizing the shadow model technique to train an attribution discriminator.

Abstract

The emergence of text-to-image models has recently sparked significant interest, but the attendant is a looming shadow of potential infringement by violating the user terms. Specifically, an adversary may exploit data created by a commercial model to train their own without proper authorization. To address such risk, it is crucial to investigate the attribution of a suspicious model's training data by determining whether its training data originates, wholly or partially, from a specific source model. To trace the generated data, existing methods require applying extra watermarks during either the training or inference phases of the source model. However, these methods are impractical for pre-trained models that have been released, especially when model owners lack security expertise. To tackle this challenge, we propose an injection-free training data attribution method for text-to-image models. It can identify whether a suspicious model's training data stems from a source model, without additional modifications on the source model. The crux of our method lies in the inherent memorization characteristic of text-to-image models. Our core insight is that the memorization of the training dataset is passed down through the data generated by the source model to the model trained on that data, making the source model and the infringing model exhibit consistent behaviors on specific samples. Therefore, our approach involves developing algorithms to uncover these distinct samples and using them as inherent watermarks to verify if a suspicious model originates from the source model. Our experiments demonstrate that our method achieves an accuracy of over 80\% in identifying the source of a suspicious model's training data, without interfering the original training or generation process of the source model.
Paper Structure (19 sections, 10 equations, 7 figures, 2 tables)

This paper contains 19 sections, 10 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The task of training data attribution. An adversary may produce some prompts (❶) and query the source model (❷), then it collects generated images to train a model (❸). The source model owner wants to investigate whether a model is trained on the data generated by the source model (❹). Note that the suspicious model may be an innocent one. (See Section \ref{['sec_pre']} for details.
  • Figure 2: User terms of commercial text-to-image models.
  • Figure 3: Our research question. The one-hop attribution is well-studied in the field of data watermarking. Our paper attempts to solve the two-hop attribution in real world generation setting.
  • Figure 4: Our core insight. In the open-vocabulary generation task, the source model can generate data in different distributions. In the view of a model extraction attack, the infringing model may extract all or part of the distributions of the source model. The $\epsilon$ in \ref{['eq:extraction']} indicates the difference between the extracted distribution and the source distribution.
  • Figure 5: Two strategies for key samples preparation in instance-level solutions. Strategy 1 is detection-based, which aims to directly select key samples from the source model's training dataset. Strategy 2 is generation-based, which aims to synthesize key samples by maximizing the similarity between source and suspicious models. Note that in both strategies, no model update is needed.
  • ...and 2 more figures