SolidMark: Evaluating Image Memorization in Generative Models
Nicky Kriplani, Minh Pham, Gowthami Somepalli, Chinmay Hegde, Niv Cohen
TL;DR
This work tackles the challenge of reliably quantifying memorization in diffusion-based image generation by introducing SolidMark, a per-image memorization score derived from prompting a model to outpaint a random grayscale border attached to each training image. By decoupling the key from image content and evaluating border reconstruction accuracy, SolidMark enables fine-grained, pixel-level memorization assessment and is compatible with varying datasets and architectures. The authors perform extensive ablations, data-duplication studies, and a pretraining-from-scratch experiment, showing SolidMark can reveal fine-grained memorization and reveal limitations of existing metrics and mitigation approaches. They also release border-injected diffusion models to encourage further research, highlighting SolidMark’s potential to complement traditional metrics in measuring memorization and to inform robust mitigation strategies in practice.
Abstract
Recent works have shown that diffusion models are able to memorize training images and emit them at generation time. However, the metrics used to evaluate memorization and its mitigation techniques suffer from dataset-dependent biases and struggle to detect whether a given specific image has been memorized or not. This paper begins with a comprehensive exploration of issues surrounding memorization metrics in diffusion models. Then, to mitigate these issues, we introduce $\rm \style{font-variant: small-caps}{SolidMark}$, a novel evaluation method that provides a per-image memorization score. We then re-evaluate existing memorization mitigation techniques. We also show that $\rm \style{font-variant: small-caps}{SolidMark}$ is capable of evaluating fine-grained pixel-level memorization. Finally, we release a variety of models based on $\rm \style{font-variant: small-caps}{SolidMark}$ to facilitate further research for understanding memorization phenomena in generative models. All of our code is available at https://github.com/NickyDCFP/SolidMark.
