Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion
Adi Haviv, Shahar Sarfaty, Uri Hacohen, Niva Elkin-Koren, Roi Livni, Amit H Bermano
TL;DR
This paper tackles the challenge of quantifying originality in text-to-image diffusion models by proposing a token-length based measure derived from multi-token textual inversion. Through controlled synthetic generalization experiments and real-world domain tests, it demonstrates that models preferentially reconstruct familiar concepts with shorter token sequences, while original or unseen content requires more tokens for accurate reconstruction, correlating with perceived originality. The approach combines Stable Diffusion mechanics, multi-token textual inversion, and DreamSim-based reconstruction оценку to assess originality without relying on training data prompts or data disclosure. The findings suggest that model familiarity underpins originality signals and have implications for copyright analysis, model auditing, and the responsible deployment of generative content. Overall, the work provides a practical, distribution-aware framework for assessing originality in generative models and highlights the value of dataset diversity for fostering creative output within legal and ethical boundaries.
Abstract
This work addresses the challenge of quantifying originality in text-to-image (T2I) generative diffusion models, with a focus on copyright originality. We begin by evaluating T2I models' ability to innovate and generalize through controlled experiments, revealing that stable diffusion models can effectively recreate unseen elements with sufficiently diverse training data. Then, our key insight is that concepts and combinations of image elements the model is familiar with, and saw more during training, are more concisly represented in the model's latent space. We hence propose a method that leverages textual inversion to measure the originality of an image based on the number of tokens required for its reconstruction by the model. Our approach is inspired by legal definitions of originality and aims to assess whether a model can produce original content without relying on specific prompts or having the training data of the model. We demonstrate our method using both a pre-trained stable diffusion model and a synthetic dataset, showing a correlation between the number of tokens and image originality. This work contributes to the understanding of originality in generative models and has implications for copyright infringement cases.
