Table of Contents
Fetching ...

Training-free Source Attribution of AI-generated Images via Resynthesis

Pietro Bongini, Valentina Molinari, Andrea Costanzo, Benedetta Tondi, Mauro Barni

TL;DR

The paper tackles synthetic image source attribution under data scarcity by introducing a training-free, one-shot approach based on image resynthesis. By describing the image content with a caption, resynthesizing it across candidate generators, and measuring distance in CLIP feature space, the method attributes the image to the most similar resynthesis, formalized as $j^* = \arg\min_j d\big(E(s_j(p(I))), E(I)\big)$. A new dataset with 14 generators (including 7 commercial) and resyntheses is proposed to benchmark few-shot and zero-shot SIA, enabling direct comparison with baselines like CLIP-based classifiers, De-Fake, CLIP-LoRA, EfficientNetB4, and Tiny Autoencoders. Results show the resynthesis method excels in low-shot regimes (1–10 shots), while standard fine-tuned approaches dominate with more data, highlighting a valuable trade-off for practical attribution scenarios. The dataset and findings offer a principled benchmark for developing robust, data-efficient SIA methods and motivate exploring richer distance metrics and alternate secondary-description strategies.

Abstract

Synthetic image source attribution is a challenging task, especially in data scarcity conditions requiring few-shot or zero-shot classification capabilities. We present a new training-free one-shot attribution method based on image resynthesis. A prompt describing the image under analysis is generated, then it is used to resynthesize the image with all the candidate sources. The image is attributed to the model which produced the resynthesis closest to the original image in a proper feature space. We also introduce a new dataset for synthetic image attribution consisting of face images from commercial and open-source text-to-image generators. The dataset provides a challenging attribution framework, useful for developing new attribution models and testing their capabilities on different generative architectures. The dataset structure allows to test approaches based on resynthesis and to compare them to few-shot methods. Results from state-of-the-art few-shot approaches and other baselines show that the proposed resynthesis method outperforms existing techniques when only a few samples are available for training or fine-tuning. The experiments also demonstrate that the new dataset is a challenging one and represents a valuable benchmark for developing and evaluating future few-shot and zero-shot methods.

Training-free Source Attribution of AI-generated Images via Resynthesis

TL;DR

The paper tackles synthetic image source attribution under data scarcity by introducing a training-free, one-shot approach based on image resynthesis. By describing the image content with a caption, resynthesizing it across candidate generators, and measuring distance in CLIP feature space, the method attributes the image to the most similar resynthesis, formalized as . A new dataset with 14 generators (including 7 commercial) and resyntheses is proposed to benchmark few-shot and zero-shot SIA, enabling direct comparison with baselines like CLIP-based classifiers, De-Fake, CLIP-LoRA, EfficientNetB4, and Tiny Autoencoders. Results show the resynthesis method excels in low-shot regimes (1–10 shots), while standard fine-tuned approaches dominate with more data, highlighting a valuable trade-off for practical attribution scenarios. The dataset and findings offer a principled benchmark for developing robust, data-efficient SIA methods and motivate exploring richer distance metrics and alternate secondary-description strategies.

Abstract

Synthetic image source attribution is a challenging task, especially in data scarcity conditions requiring few-shot or zero-shot classification capabilities. We present a new training-free one-shot attribution method based on image resynthesis. A prompt describing the image under analysis is generated, then it is used to resynthesize the image with all the candidate sources. The image is attributed to the model which produced the resynthesis closest to the original image in a proper feature space. We also introduce a new dataset for synthetic image attribution consisting of face images from commercial and open-source text-to-image generators. The dataset provides a challenging attribution framework, useful for developing new attribution models and testing their capabilities on different generative architectures. The dataset structure allows to test approaches based on resynthesis and to compare them to few-shot methods. Results from state-of-the-art few-shot approaches and other baselines show that the proposed resynthesis method outperforms existing techniques when only a few samples are available for training or fine-tuning. The experiments also demonstrate that the new dataset is a challenging one and represents a valuable benchmark for developing and evaluating future few-shot and zero-shot methods.

Paper Structure

This paper contains 18 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Workflow of the proposed source attribution method based on resynthesis. The original image $I$ in this example was generated by Freepik. From left to right, the resyntheses$r_{j}$ were produced by: Bing, Firefly, Flux-dev, Freepik, Imagen3, Leonardo AI, Midjourney, Nightcafe, Stable Diffusion 3, Starry AI. The features of $I$ and all the $r_{j}$ are extracted with CLIP. The distances between the projections $E(I)$ and each $E(r_j)$ in CLIP's feature space are measured, and $I$ is attributed to the source $s_{j*}$ with the minimum distance $d(E(I), E(r_{j*}))$. In this example, the image is correctly attributed to $s_4$, which corresponds to Freepik.
  • Figure 2: Dataset building procedure. This example corresponds to character #$42$: Aragorn (The Lord of The Rings). Starting from the top and going down: The primary description is used as a prompt for the 10 core generative models introduced in Section \ref{['subsec:generators']}, which produce the images in the first row. The models are in alphabetical order, from left to right. Each image is then passed to Chat-GPT to obtain a secondary description. The latter is used to generate the resyntheses in the bottom double-column groups, using all the 10 core generators again. In each group of resyntheses, from left to right and from top to bottom, the generators are again in alphabetical order.
  • Figure 3: Dataset composition and split. Each of the $100$characters is generated with all the sources, obtaining $1,000$original images. Their secondary descriptions are extracted with ChatGPT and used to generate $10$resyntheses each, one with each source. These are the components of the main dataset. The dataset is split along the character index, meaning that all the $10$original images of each character and their $100$resyntheses belong to the same set. The dataset extension (Ex) is then obtained by taking the test set and adding the other $4$ sources. $40$ new original images are produced in this way, along with their $40\times14=560$resyntheses and $400$resyntheses of the existing original images with the $4$ new sources.
  • Figure 4: Accuracy in a few-shot scenario (T1) with different numbers of classes. For the experiments with 12 and 14 classes, the test-only sources were used. The Accuracy is plotted with respect to the number of shots available for training. Our Resynthesis method is the navy-blue line. The performance of our training-free method does not depend on the number of shots, and is therefore constant. The intersection between the line representing our method's accuracy and a baseline's accuracy line corresponds to the number of shot needed by that baseline to outperform our training-free (one-shot equivalent) method.