Table of Contents
Fetching ...

SolidMark: Evaluating Image Memorization in Generative Models

Nicky Kriplani, Minh Pham, Gowthami Somepalli, Chinmay Hegde, Niv Cohen

TL;DR

This work tackles the challenge of reliably quantifying memorization in diffusion-based image generation by introducing SolidMark, a per-image memorization score derived from prompting a model to outpaint a random grayscale border attached to each training image. By decoupling the key from image content and evaluating border reconstruction accuracy, SolidMark enables fine-grained, pixel-level memorization assessment and is compatible with varying datasets and architectures. The authors perform extensive ablations, data-duplication studies, and a pretraining-from-scratch experiment, showing SolidMark can reveal fine-grained memorization and reveal limitations of existing metrics and mitigation approaches. They also release border-injected diffusion models to encourage further research, highlighting SolidMark’s potential to complement traditional metrics in measuring memorization and to inform robust mitigation strategies in practice.

Abstract

Recent works have shown that diffusion models are able to memorize training images and emit them at generation time. However, the metrics used to evaluate memorization and its mitigation techniques suffer from dataset-dependent biases and struggle to detect whether a given specific image has been memorized or not. This paper begins with a comprehensive exploration of issues surrounding memorization metrics in diffusion models. Then, to mitigate these issues, we introduce $\rm \style{font-variant: small-caps}{SolidMark}$, a novel evaluation method that provides a per-image memorization score. We then re-evaluate existing memorization mitigation techniques. We also show that $\rm \style{font-variant: small-caps}{SolidMark}$ is capable of evaluating fine-grained pixel-level memorization. Finally, we release a variety of models based on $\rm \style{font-variant: small-caps}{SolidMark}$ to facilitate further research for understanding memorization phenomena in generative models. All of our code is available at https://github.com/NickyDCFP/SolidMark.

SolidMark: Evaluating Image Memorization in Generative Models

TL;DR

This work tackles the challenge of reliably quantifying memorization in diffusion-based image generation by introducing SolidMark, a per-image memorization score derived from prompting a model to outpaint a random grayscale border attached to each training image. By decoupling the key from image content and evaluating border reconstruction accuracy, SolidMark enables fine-grained, pixel-level memorization assessment and is compatible with varying datasets and architectures. The authors perform extensive ablations, data-duplication studies, and a pretraining-from-scratch experiment, showing SolidMark can reveal fine-grained memorization and reveal limitations of existing metrics and mitigation approaches. They also release border-injected diffusion models to encourage further research, highlighting SolidMark’s potential to complement traditional metrics in measuring memorization and to inform robust mitigation strategies in practice.

Abstract

Recent works have shown that diffusion models are able to memorize training images and emit them at generation time. However, the metrics used to evaluate memorization and its mitigation techniques suffer from dataset-dependent biases and struggle to detect whether a given specific image has been memorized or not. This paper begins with a comprehensive exploration of issues surrounding memorization metrics in diffusion models. Then, to mitigate these issues, we introduce , a novel evaluation method that provides a per-image memorization score. We then re-evaluate existing memorization mitigation techniques. We also show that is capable of evaluating fine-grained pixel-level memorization. Finally, we release a variety of models based on to facilitate further research for understanding memorization phenomena in generative models. All of our code is available at https://github.com/NickyDCFP/SolidMark.

Paper Structure

This paper contains 45 sections, 4 equations, 7 figures, 8 tables, 2 algorithms.

Figures (7)

  • Figure 1: An overview of SolidMark. We begin by augmenting training images with random scalar keys in the form of grayscale borders. Next, we inject these keys into the model by training it on these augmented images. To query for a key, we ask the model to outpaint a training image's border using the training caption as the text prompt. We retrieve its prediction at the key by averaging the outpainted border. Finally, we report the distance between the predicted key and the true value.
  • Figure 2: $\bar{\ell}_2$ distance reports monochromatic images as memorizations. Despite not being memorizations of their nearest neighbors in the training set, monochromatic images generate a low $\bar{\ell}_2$ distance. (Top) Out of 5,000 generations, the 10 generations with smallest patched $\bar{\ell}_2$ distance from CIFAR-10 train. (Bottom) The corresponding nearest neighbors in CIFAR-10 train to the top row of generations.
  • Figure 3: 95th percentile scoring fails to capture fine-grained reductions in memorization. The above graphs demonstrate how a 95th percentile metric can fail to report successful memorization reduction. (Top) A distribution showing the density (vertical axis) of different similarity values (horizontal axis) in a model's baseline results. (Bottom) The memorization-reduced evaluation, where the 95th percentile did not change at all despite clear memorization reductions shown in the 96th percentile.
  • Figure 4: Samples from Pretrained Text-to-Image Model. (Top) Prompts used to generate images from our pretrained model. (Bottom) The resultant images for the respective prompt.
  • Figure 5: Augmentations Applied to Query Images. We show examples of the augmentations used to validate SolidMark's fine-grainedness. Implementation details in Appendix Section \ref{['appendix:augmentation-ablation-implementation']}.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1: Eidetic Metric
  • Definition 2: Eidetic Memorization