Unified Concept Editing in Diffusion Models

Rohit Gandikota; Hadas Orgad; Yonatan Belinkov; Joanna Materzyńska; David Bau

Unified Concept Editing in Diffusion Models

Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzyńska, David Bau

TL;DR

This work tackles the simultaneous safety concerns of copyright adherence, offensive content, and social biases in text-to-image diffusion models by introducing Unified Concept Editing (UCE). UCE provides a closed-form cross-attention weight update that can apply hundreds of edits in one pass, generalizing prior techniques such as TIME and MEMIT to diffusion models. Edits are categorized as erasing, debiasing, or moderation, each implemented via a unified objective that preserves unedited concepts through targeted preservation terms, with an explicit update formula $W = ig( extstyle\sum_{c_i\in E} v_i^* c_i^T + extstyle\sum_{c_j\in P} W^{old} c_j c_j^T ig) ig( extstyle\sum_{c_i\in E} c_i c_i^T + extstyle\sum_{c_j\in P} c_j c_j^T ig)^{-1}$. Experiments demonstrate effective artistic style erasure, multi-attribute debiasing (gender and race), and NSFW moderation with reduced interference on non-target concepts, supporting scalable, post-training safety editing for real-world deployment.

Abstract

Text-to-image models suffer from various safety issues that may limit their suitability for deployment. Previous methods have separately addressed individual issues of bias, copyright, and offensive content in text-to-image models. However, in the real world, all of these issues appear simultaneously in the same model. We present a method that tackles all issues with a single approach. Our method, Unified Concept Editing (UCE), edits the model without training using a closed-form solution, and scales seamlessly to concurrent edits on text-conditional diffusion models. We demonstrate scalable simultaneous debiasing, style erasure, and content moderation by editing text-to-image projections, and we present extensive experiments demonstrating improved efficacy and scalability over prior work. Our code is available at https://unified.baulab.info

Unified Concept Editing in Diffusion Models

TL;DR

. Experiments demonstrate effective artistic style erasure, multi-attribute debiasing (gender and race), and NSFW moderation with reduced interference on non-target concepts, supporting scalable, post-training safety editing for real-world deployment.

Abstract

Paper Structure (30 sections, 24 equations, 21 figures, 6 tables, 1 algorithm)

This paper contains 30 sections, 24 equations, 21 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Copyright issues.
Offensive content.
Social biases.
Model editing.
Background
Method
Erasing
Debiasing
Moderation
Experiments
Erasing
Artist erasure
Erasing Objects
...and 15 more sections

Figures (21)

Figure 1: Our method enables unified and efficient editing of multiple concepts in text-to-image models through closed-form modifications to attention weights. We present applications to debias, erase, and moderate concepts at scale. Debiasing professions leads the edited model to generate fairer gender and race ratios. Erasing an artistic style removes characteristics associated with a particular creator. Moderating the model reduces the likelihood of generating inappropriate images.
Figure 2: Closed-form editing of cross-attention weights enables concept manipulation in diffusion models. Our method modifies the attention weights to induce targeted changes to the keys and values corresponding to specific text embeddings for a set of edited concepts $c_i\in E$ while minimizing changes to a set of preserved concepts $c_j\in P$. That dual objective allows debiasing, erasing, or moderating concepts while preserving unrelated ones. The same editing function is applied in all cases, but the target keys and values are set differently per application. As a closed-form edit, modifying attention weights given the new keys and values mappings takes less than 1 minute. That enables efficient simultaneous editing of multiple concepts.
Figure 3: Our method and ESD-x show strong erasing capabilities. SDD and Ablationstart to dilute their erasing capabilities as the number of concepts being erased are increased.
Figure 4: Our method preserves the remaining knowledge of the model better after the edit. The figure shows images generated from different editing methods, for the same prompts and seeds, across a variety of artists that are not erased. Our method exhibits lower LPIPS, indicating less change to unerased concepts during model editing. Similarly for COCO, we find that our method has better CLIP scores across all the scales. This demonstrates that our method has significantly reduced interference compared to other fine-tuning approaches when editing.
Figure 5: Our method improves the gender representation of professions in the stable diffusion generated images. We find that the images precisely change the gender while keeping the rest of the scene intact.
...and 16 more figures

Unified Concept Editing in Diffusion Models

TL;DR

Abstract

Unified Concept Editing in Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (21)