Table of Contents
Fetching ...

Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models

Kartik Thakral, Tamar Glaser, Tal Hassner, Mayank Vatsa, Richa Singh

TL;DR

FADE (Fine-Grained Attenuation for Diffusion Erasure for Diffusion Erasure), introducing adjacency-aware unlearning in diffusion models, achieves at least a 12% improvement in retention performance over state-of-the-art methods.

Abstract

Existing unlearning algorithms in text-to-image generative models often fail to preserve the knowledge of semantically related concepts when removing specific target concepts: a challenge known as adjacency. To address this, we propose FADE (Fine grained Attenuation for Diffusion Erasure), introducing adjacency aware unlearning in diffusion models. FADE comprises two components: (1) the Concept Neighborhood, which identifies an adjacency set of related concepts, and (2) Mesh Modules, employing a structured combination of Expungement, Adjacency, and Guidance loss components. These enable precise erasure of target concepts while preserving fidelity across related and unrelated concepts. Evaluated on datasets like Stanford Dogs, Oxford Flowers, CUB, I2P, Imagenette, and ImageNet1k, FADE effectively removes target concepts with minimal impact on correlated concepts, achieving atleast a 12% improvement in retention performance over state-of-the-art methods.

Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models

TL;DR

FADE (Fine-Grained Attenuation for Diffusion Erasure for Diffusion Erasure), introducing adjacency-aware unlearning in diffusion models, achieves at least a 12% improvement in retention performance over state-of-the-art methods.

Abstract

Existing unlearning algorithms in text-to-image generative models often fail to preserve the knowledge of semantically related concepts when removing specific target concepts: a challenge known as adjacency. To address this, we propose FADE (Fine grained Attenuation for Diffusion Erasure), introducing adjacency aware unlearning in diffusion models. FADE comprises two components: (1) the Concept Neighborhood, which identifies an adjacency set of related concepts, and (2) Mesh Modules, employing a structured combination of Expungement, Adjacency, and Guidance loss components. These enable precise erasure of target concepts while preserving fidelity across related and unrelated concepts. Evaluated on datasets like Stanford Dogs, Oxford Flowers, CUB, I2P, Imagenette, and ImageNet1k, FADE effectively removes target concepts with minimal impact on correlated concepts, achieving atleast a 12% improvement in retention performance over state-of-the-art methods.

Paper Structure

This paper contains 20 sections, 1 theorem, 14 equations, 13 figures, 9 tables.

Key Result

Theorem 1

Let $\mathbf{x} \in \mathbb{R}^{h \times w \times c}$ represent an image with dimensions height $h$, width $w$, and channels $c$. Let the mapping function $\phi: \mathbb{R}^{h \times w \times c} \to \mathbb{R}^d$ project the image $\mathbf{x}$ into a latent feature space $\mathbb{R}^d$, where $d \ll

Figures (13)

  • Figure 1: Fine-Grained Concept Erasure: This figure demonstrates the issue of collateral forgetting (termed as adjacency) in selective concept erasure using existing state-of-the-art algorithms in text-to-image diffusion-based foundation models. It highlights the inability of methods that can precisely erase target concepts from a model’s knowledge while preserving its ability to generate closely related concepts.
  • Figure 2: Visual illustration of complete erasure process. (a) The dataset $D$ is organized into unlearning set $\mathcal{D}_u$ and adjacency set $\mathcal{A}(c_{\text{tar}})$ using concept neighborhood, (b) these sets are utilized by mesh-modules for selective erasure while maintaining semantic integrity if the model on neighboring concepts.
  • Figure 3: Qualitative comparison between existing and proposed algorithms for erasing target concepts and testing retention on neighboring fine-grained concepts. We visualize one target concept each from the Stanford Dogs, Oxford Flowers, and CUB datasets. Visualizations for more concepts are available in the supplementary.
  • Figure 4: Radar plot comparing FADE with existing unlearning methods (ESD, FMN, SPM, Receler) by structural similarity score (circular axis, %) and adjacency accuracy (radial axes) on concepts from the ImageNet-1k dataset. Most methods begin to degrade beyond a similarity score of 70%, with SPM resilient until 90% and FADE showing the highest robustness. For fair analysis, only methods with $A_{\text{er}} \leq 20\%$ are considered.
  • Figure 5: NudeNet Evaluation on the I2P benchmark. The numbers followed by "SD" indicate the count of exposed body parts in the SD v1.4 generations. The binplots show the reduction achieved by different methods for erasing nudity. Compared to prior works, FADE effectively eliminates explicit content across various nude categories.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Theorem 1: k-NN Approximation to Naive Bayes in $\mathbb{R}^d$