DeepTaster: Adversarial Perturbation-Based Fingerprinting to Identify Proprietary Dataset Use in Deep Neural Networks
Seonhye Park, Alsharif Abuadbba, Shuo Wang, Kristen Moore, Yansong Gao, Hyoungshick Kim, Surya Nepal
TL;DR
DeepTaster addresses the problem of verifying ownership of proprietary training data in DNNs when model architectures may differ due to theft or transfer learning. It introduces a two-stage, architecture-insensitive fingerprinting approach that builds a one-class DeepSVDD classifier from adversarial perturbations transformed into the Discrete Fourier Transform domain and then verifies suspect models using these features. Across CIFAR10, MNIST, and Tiny-ImageNet with ResNet18, VGG16, and DenseNet161, DeepTaster demonstrates robust detection under eight attack scenarios, often outperforming the prior DeepJudge method, and shows practical feasibility with measurable latency improvements. The work establishes a non-invasive, data-centric method for dataset provenance, offering strong protection against model and dataset theft with broad applicability in ML-as-a-Service contexts.
Abstract
Training deep neural networks (DNNs) requires large datasets and powerful computing resources, which has led some owners to restrict redistribution without permission. Watermarking techniques that embed confidential data into DNNs have been used to protect ownership, but these can degrade model performance and are vulnerable to watermark removal attacks. Recently, DeepJudge was introduced as an alternative approach to measuring the similarity between a suspect and a victim model. While DeepJudge shows promise in addressing the shortcomings of watermarking, it primarily addresses situations where the suspect model copies the victim's architecture. In this study, we introduce DeepTaster, a novel DNN fingerprinting technique, to address scenarios where a victim's data is unlawfully used to build a suspect model. DeepTaster can effectively identify such DNN model theft attacks, even when the suspect model's architecture deviates from the victim's. To accomplish this, DeepTaster generates adversarial images with perturbations, transforms them into the Fourier frequency domain, and uses these transformed images to identify the dataset used in a suspect model. The underlying premise is that adversarial images can capture the unique characteristics of DNNs built with a specific dataset. To demonstrate the effectiveness of DeepTaster, we evaluated the effectiveness of DeepTaster by assessing its detection accuracy on three datasets (CIFAR10, MNIST, and Tiny-ImageNet) across three model architectures (ResNet18, VGG16, and DenseNet161). We conducted experiments under various attack scenarios, including transfer learning, pruning, fine-tuning, and data augmentation. Specifically, in the Multi-Architecture Attack scenario, DeepTaster was able to identify all the stolen cases across all datasets, while DeepJudge failed to detect any of the cases.
