Model Synthesis for Zero-Shot Model Attribution
Tianyun Yang, Juan Cao, Danding Wang, Chang Xu
TL;DR
This work tackles zero-shot model attribution by moving beyond closed-set classifiers to a fingerprint-distance framework trained on a large corpus of synthetic models. It introduces a model-synthesis strategy that yields 5760 shallow, diverse synthetic architectures to emulate real model fingerprints, and trains a ResNet50-based fingerprint extractor using a joint CE and triplet loss, with spectral fingerprints extracted via Discrete Fourier Transform on denoised images. The authors define Frechet Frequency Distance (FFD) to quantify fidelity between real and synthetic fingerprint distributions and demonstrate strong zero-shot generalization across GANs, VAEs, flows, and diffusion models, including Stable Diffusion and DALL-E variants, achieving substantial gains in model identification (>40%) and verification (>15%) on unseen models. The approach also enables tracing LoRA-based variants to their base models, offering a scalable defense against IP infringements while enabling dynamic gallery updates without retraining on new real models.
Abstract
Nowadays, generative models are shaping various fields such as art, design, and human-computer interaction, yet accompanied by challenges related to copyright infringement and content management. In response, existing research seeks to identify the unique fingerprints on the images they generate, which can be leveraged to attribute the generated images to their source models. Existing methods, however, are constrained to identifying models within a static set included in the classifier training, failing to adapt to newly emerged unseen models dynamically. To bridge this gap, we aim to develop a generalized model fingerprint extractor capable of zero-shot attribution, effectively attributes unseen models without exposure during training. Central to our method is a model synthesis technique, which generates numerous synthetic models mimicking the fingerprint patterns of real-world generative models. The design of the synthesis technique is motivated by observations on how the basic generative model's architecture building blocks and parameters influence fingerprint patterns, and it is validated through two designed metrics that examine synthetic models' fidelity and diversity. Our experiments demonstrate that this fingerprint extractor, trained solely on synthetic models, achieves impressive zero-shot generalization on a wide range of real-world generative models, improving model identification and verification accuracy on unseen models by over 40% and 15%, respectively, compared to existing approaches.
