Riemannian-Geometric Fingerprints of Generative Models
Hae Jin Song, Laurent Itti
TL;DR
This work introduces a principled Riemannian-geometry framework to fingerprint and attribute generative models on non-Euclidean data. By learning a latent Riemannian metric via a VAE and using geodesic distances and a gradient-based Riemannian center of mass, it defines artifacts and fingerprints that robustly distinguish a wide range of generative methods across multiple datasets and modalities. Empirically, the proposed Riemannian fingerprints outperform Euclidean baselines and show strong generalization to unseen datasets, model types, and vision-language models, with a ResNet-based classifier leveraging artifact features for attribution. The approach offers a practical, geometry-aware toolkit for model provenance and IP protection in real-world, multimodal settings.
Abstract
Recent breakthroughs and rapid integration of generative models (GMs) have sparked interest in the problem of model attribution and their fingerprints. For instance, service providers need reliable methods of authenticating their models to protect their IP, while users and law enforcement seek to verify the source of generated content for accountability and trust. In addition, a growing threat of model collapse is arising, as more model-generated data are being fed back into sources (e.g., YouTube) that are often harvested for training ("regurgitative training"), heightening the need to differentiate synthetic from human data. Yet, a gap still exists in understanding generative models' fingerprints, we believe, stemming from the lack of a formal framework that can define, represent, and analyze the fingerprints in a principled way. To address this gap, we take a geometric approach and propose a new definition of artifact and fingerprint of GMs using Riemannian geometry, which allows us to leverage the rich theory of differential geometry. Our new definition generalizes previous work (Song et al., 2024) to non-Euclidean manifolds by learning Riemannian metrics from data and replacing the Euclidean distances and nearest-neighbor search with geodesic distances and kNN-based Riemannian center of mass. We apply our theory to a new gradient-based algorithm for computing the fingerprints in practice. Results show that it is more effective in distinguishing a large array of GMs, spanning across 4 different datasets in 2 different resolutions (64 by 64, 256 by 256), 27 model architectures, and 2 modalities (Vision, Vision-Language). Using our proposed definition significantly improves the performance on model attribution, as well as a generalization to unseen datasets, model types, and modalities, suggesting its practical efficacy.
