Table of Contents
Fetching ...

Embedding Space Selection for Detecting Memorization and Fingerprinting in Generative Models

Jack He, Jianxing Zhao, Andrew Bai, Cho-Jui Hsieh

TL;DR

This work examines data memorization in generative models through embedding-based measures, focusing on the $C_T$-score computed from layer embeddings and how Vision Transformer layers reveal a trend where deeper layers exhibit lower memorization. It demonstrates that early ViT layers capture low-level memorization (colors, textures) while deeper layers encode high-level semantic information, and that model architecture leaves a distinctive memorization fingerprint across layers. Building on this insight, the authors introduce a $C_T$-layer curve fingerprinting method that identifies generative-model authorship with up to 100% accuracy on full datasets and substantial gains over baselines, without requiring access to training data. The findings enable improved detection of deepfakes and help trace content provenance, with practical implications for privacy, content integrity, and AI governance, while also outlining limitations and directions for scalable, ethical deployment.

Abstract

In the rapidly evolving landscape of artificial intelligence, generative models such as Generative Adversarial Networks (GANs) and Diffusion Models have become cornerstone technologies, driving innovation in diverse fields from art creation to healthcare. Despite their potential, these models face the significant challenge of data memorization, which poses risks to privacy and the integrity of generated content. Among various metrics of memorization detection, our study delves into the memorization scores calculated from encoder layer embeddings, which involves measuring distances between samples in the embedding spaces. Particularly, we find that the memorization scores calculated from layer embeddings of Vision Transformers (ViTs) show an notable trend - the latter (deeper) the layer, the less the memorization measured. It has been found that the memorization scores from the early layers' embeddings are more sensitive to low-level memorization (e.g. colors and simple patterns for an image), while those from the latter layers are more sensitive to high-level memorization (e.g. semantic meaning of an image). We also observe that, for a specific model architecture, its degree of memorization on different levels of information is unique. It can be viewed as an inherent property of the architecture. Building upon this insight, we introduce a unique fingerprinting methodology. This method capitalizes on the unique distributions of the memorization score across different layers of ViTs, providing a novel approach to identifying models involved in generating deepfakes and malicious content. Our approach demonstrates a marked 30% enhancement in identification accuracy over existing baseline methods, offering a more effective tool for combating digital misinformation.

Embedding Space Selection for Detecting Memorization and Fingerprinting in Generative Models

TL;DR

This work examines data memorization in generative models through embedding-based measures, focusing on the -score computed from layer embeddings and how Vision Transformer layers reveal a trend where deeper layers exhibit lower memorization. It demonstrates that early ViT layers capture low-level memorization (colors, textures) while deeper layers encode high-level semantic information, and that model architecture leaves a distinctive memorization fingerprint across layers. Building on this insight, the authors introduce a -layer curve fingerprinting method that identifies generative-model authorship with up to 100% accuracy on full datasets and substantial gains over baselines, without requiring access to training data. The findings enable improved detection of deepfakes and help trace content provenance, with practical implications for privacy, content integrity, and AI governance, while also outlining limitations and directions for scalable, ethical deployment.

Abstract

In the rapidly evolving landscape of artificial intelligence, generative models such as Generative Adversarial Networks (GANs) and Diffusion Models have become cornerstone technologies, driving innovation in diverse fields from art creation to healthcare. Despite their potential, these models face the significant challenge of data memorization, which poses risks to privacy and the integrity of generated content. Among various metrics of memorization detection, our study delves into the memorization scores calculated from encoder layer embeddings, which involves measuring distances between samples in the embedding spaces. Particularly, we find that the memorization scores calculated from layer embeddings of Vision Transformers (ViTs) show an notable trend - the latter (deeper) the layer, the less the memorization measured. It has been found that the memorization scores from the early layers' embeddings are more sensitive to low-level memorization (e.g. colors and simple patterns for an image), while those from the latter layers are more sensitive to high-level memorization (e.g. semantic meaning of an image). We also observe that, for a specific model architecture, its degree of memorization on different levels of information is unique. It can be viewed as an inherent property of the architecture. Building upon this insight, we introduce a unique fingerprinting methodology. This method capitalizes on the unique distributions of the memorization score across different layers of ViTs, providing a novel approach to identifying models involved in generating deepfakes and malicious content. Our approach demonstrates a marked 30% enhancement in identification accuracy over existing baseline methods, offering a more effective tool for combating digital misinformation.
Paper Structure (19 sections, 1 equation, 6 figures, 3 tables)

This paper contains 19 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: ViT encoder trend based on DDPM-generated Image. We use 500 randomly sampled DDPM-generated ho2020denoising CIFAR-10 Images to compute the $C_T$ scores with three different Vision Transformers, namely "vit-base-patch16", "vit-large-patch16", and "vit-huge-patch14". We observe a consistently increasing trend.
  • Figure 3: $C_T$ Score Comparison of ViT based Encoders and CNN Encoders. This figure illustrates the trends in encoder performance for Vision Transformer, Dino ViT caron2021emerging, DeiT touvron2021training, and Swin Transformers liu2021swin, compared with ResNet he2015deep and InceptionV3 heusel2018gans encoders. Each graph is generated based on 500 GAN generated images, under similar settings to those in Figure \ref{['fig:GAN500ct']}.
  • Figure 4: Samples for curated datasets
  • Figure 5: $C_T$ vs Layers Curve for Six different dataset
  • Figure 6: $C_T$-scores of Multiple Checkpoints of DDPM and DDIM (a) The raw $C_T$-Layer curves of checkpoints when training epochs = 150, 180, 210, 240, 270 for both DDPM and DDIM (b) The heat map records all pair-wise cosine similarities between checkpoints. The heat map is symmetric, with the 1st and 3rd quadrants displaying inter-architecture similarities, and the 2nd and 4th quadrants displaying intra-architecture similarities. The diagonal is crossed out since diagonal blocks correspond to all the checkpoints' similarities with themselves (always 1).
  • ...and 1 more figures