Editing Massive Concepts in Text-to-Image Diffusion Models

Tianwei Xiong; Yue Wu; Enze Xie; Yue Wu; Zhenguo Li; Xihui Liu

Editing Massive Concepts in Text-to-Image Diffusion Models

Tianwei Xiong, Yue Wu, Enze Xie, Yue Wu, Zhenguo Li, Xihui Liu

TL;DR

A two-stage method, Editing Massive Concepts In Diffusion Models (EMCID), providing a practical approach for fast adjustment and re-deployment of T2I diffusion models in real-world applications and a comprehensive benchmark for evaluating massive concept editing for T2I models.

Abstract

Text-to-image diffusion models suffer from the risk of generating outdated, copyrighted, incorrect, and biased content. While previous methods have mitigated the issues on a small scale, it is essential to handle them simultaneously in larger-scale real-world scenarios. We propose a two-stage method, Editing Massive Concepts In Diffusion Models (EMCID). The first stage performs memory optimization for each individual concept with dual self-distillation from text alignment loss and diffusion noise prediction loss. The second stage conducts massive concept editing with multi-layer, closed form model editing. We further propose a comprehensive benchmark, named ImageNet Concept Editing Benchmark (ICEB), for evaluating massive concept editing for T2I models with two subtasks, free-form prompts, massive concept categories, and extensive evaluation metrics. Extensive experiments conducted on our proposed benchmark and previous benchmarks demonstrate the superior scalability of EMCID for editing up to 1,000 concepts, providing a practical approach for fast adjustment and re-deployment of T2I diffusion models in real-world applications.

Editing Massive Concepts in Text-to-Image Diffusion Models

TL;DR

Abstract

Paper Structure (31 sections, 14 equations, 19 figures, 8 tables, 1 algorithm)

This paper contains 31 sections, 14 equations, 19 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Method
Overview
Stage I: Memory Optimization with Dual Self-Distillation
Stage II: Model Editing for Massive Concepts
Benchmark
Data Collection
Task definition
Evaluation Metrics
Experiments
Experiments Setup
Large-Scale Arbitrary Concept Editing
Concept Rectification
Erasing Artist Styles
...and 16 more sections

Figures (19)

Figure 1: Our method EMCID generally edits source concepts, the concepts intended to be modified, to match destination concepts, the concepts towards which source concepts are to be altered. Our method can update, forget, rectify, and debias various concepts simultaneously at a large scale.
Figure 2: The two-stage pipeline of EMCID. We demonstrate stage I with the example of updating the source concept, "the US president", as the destination concept "Joe Biden". In the first stage, we align both the embeddings of the text prompts and the noise predictions $\boldsymbol{\epsilon}_{\text{dst}}\triangleq \boldsymbol{\epsilon}(\mathbf{x}_{t}, \mathbf{c}_{\text{dst}},t)$ and $\boldsymbol{\epsilon}_{\text{src}}\triangleq \boldsymbol{\epsilon}(\mathbf{x}_{t}, \mathbf{c}_{\text{src}}, t)$. Multiple source concepts can be independently updated. In stage II, we edit the MLPs of the intermediate layers of the text encoder using a closed-form solution based on the independent values obtained from stage I.
Figure 3: We present comparisons on the task of Arbitrary Concept Editing. We use the dot marker for methods editing source concepts as designated concepts, and the cross marker for concept erasing methods. Source2Dest and Alias2Dest are not suitable for comparisons for concept erasing methods, thus not presented for them. Our EMCID can successfully edit up to 300 concepts with minor influence to holdout concepts. In comparison, the success of our baselines in editing source concepts and preserving holdout concepts exhibits a rapid decline as the number of edits increases.
Figure 4: The qualitative comparison between EMCID and UCE on the task of rectifying misunderstood aliases. The correct generation results are wrapped in green, while the incorrect ones are wrapped in red. EMCID presents remarkable efficacy while the baseline method, UCE, often fails to rectify the aliases effectively.
Figure 5: We present comparisons between EMCID and the baseline method, UCE, focusing on the preservation of holdout artist styles and overall generation capabilities after erasing a large number of artist styles. (a) For the qualitative results in the left part, we showcase the preservation of the style of The Great Wave off Kanagawa by Hokusai. (b) The quantitative results at the right part demonstrate the preservation of both 500 holdout artist styles and the overall generation capabilities. (c) Our method excels at preserving the unique styles of holdout artists, particularly when removing more than 500 styles. Moreover, the drop in the overall generation capabilities caused by EMCID is negligible even after erasing 1,000 styles.
...and 14 more figures

Editing Massive Concepts in Text-to-Image Diffusion Models

TL;DR

Abstract

Editing Massive Concepts in Text-to-Image Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (19)