Table of Contents
Fetching ...

Editing Massive Concepts in Text-to-Image Diffusion Models

Tianwei Xiong, Yue Wu, Enze Xie, Yue Wu, Zhenguo Li, Xihui Liu

TL;DR

A two-stage method, Editing Massive Concepts In Diffusion Models (EMCID), providing a practical approach for fast adjustment and re-deployment of T2I diffusion models in real-world applications and a comprehensive benchmark for evaluating massive concept editing for T2I models.

Abstract

Text-to-image diffusion models suffer from the risk of generating outdated, copyrighted, incorrect, and biased content. While previous methods have mitigated the issues on a small scale, it is essential to handle them simultaneously in larger-scale real-world scenarios. We propose a two-stage method, Editing Massive Concepts In Diffusion Models (EMCID). The first stage performs memory optimization for each individual concept with dual self-distillation from text alignment loss and diffusion noise prediction loss. The second stage conducts massive concept editing with multi-layer, closed form model editing. We further propose a comprehensive benchmark, named ImageNet Concept Editing Benchmark (ICEB), for evaluating massive concept editing for T2I models with two subtasks, free-form prompts, massive concept categories, and extensive evaluation metrics. Extensive experiments conducted on our proposed benchmark and previous benchmarks demonstrate the superior scalability of EMCID for editing up to 1,000 concepts, providing a practical approach for fast adjustment and re-deployment of T2I diffusion models in real-world applications.

Editing Massive Concepts in Text-to-Image Diffusion Models

TL;DR

A two-stage method, Editing Massive Concepts In Diffusion Models (EMCID), providing a practical approach for fast adjustment and re-deployment of T2I diffusion models in real-world applications and a comprehensive benchmark for evaluating massive concept editing for T2I models.

Abstract

Text-to-image diffusion models suffer from the risk of generating outdated, copyrighted, incorrect, and biased content. While previous methods have mitigated the issues on a small scale, it is essential to handle them simultaneously in larger-scale real-world scenarios. We propose a two-stage method, Editing Massive Concepts In Diffusion Models (EMCID). The first stage performs memory optimization for each individual concept with dual self-distillation from text alignment loss and diffusion noise prediction loss. The second stage conducts massive concept editing with multi-layer, closed form model editing. We further propose a comprehensive benchmark, named ImageNet Concept Editing Benchmark (ICEB), for evaluating massive concept editing for T2I models with two subtasks, free-form prompts, massive concept categories, and extensive evaluation metrics. Extensive experiments conducted on our proposed benchmark and previous benchmarks demonstrate the superior scalability of EMCID for editing up to 1,000 concepts, providing a practical approach for fast adjustment and re-deployment of T2I diffusion models in real-world applications.
Paper Structure (31 sections, 14 equations, 19 figures, 8 tables, 1 algorithm)

This paper contains 31 sections, 14 equations, 19 figures, 8 tables, 1 algorithm.

Figures (19)

  • Figure 1: Our method EMCID generally edits source concepts, the concepts intended to be modified, to match destination concepts, the concepts towards which source concepts are to be altered. Our method can update, forget, rectify, and debias various concepts simultaneously at a large scale.
  • Figure 2: The two-stage pipeline of EMCID. We demonstrate stage I with the example of updating the source concept, "the US president", as the destination concept "Joe Biden". In the first stage, we align both the embeddings of the text prompts and the noise predictions $\boldsymbol{\epsilon}_{\text{dst}}\triangleq \boldsymbol{\epsilon}(\mathbf{x}_{t}, \mathbf{c}_{\text{dst}},t)$ and $\boldsymbol{\epsilon}_{\text{src}}\triangleq \boldsymbol{\epsilon}(\mathbf{x}_{t}, \mathbf{c}_{\text{src}}, t)$. Multiple source concepts can be independently updated. In stage II, we edit the MLPs of the intermediate layers of the text encoder using a closed-form solution based on the independent values obtained from stage I.
  • Figure 3: We present comparisons on the task of Arbitrary Concept Editing. We use the dot marker for methods editing source concepts as designated concepts, and the cross marker for concept erasing methods. Source2Dest and Alias2Dest are not suitable for comparisons for concept erasing methods, thus not presented for them. Our EMCID can successfully edit up to 300 concepts with minor influence to holdout concepts. In comparison, the success of our baselines in editing source concepts and preserving holdout concepts exhibits a rapid decline as the number of edits increases.
  • Figure 4: The qualitative comparison between EMCID and UCE on the task of rectifying misunderstood aliases. The correct generation results are wrapped in green, while the incorrect ones are wrapped in red. EMCID presents remarkable efficacy while the baseline method, UCE, often fails to rectify the aliases effectively.
  • Figure 5: We present comparisons between EMCID and the baseline method, UCE, focusing on the preservation of holdout artist styles and overall generation capabilities after erasing a large number of artist styles. (a) For the qualitative results in the left part, we showcase the preservation of the style of The Great Wave off Kanagawa by Hokusai. (b) The quantitative results at the right part demonstrate the preservation of both 500 holdout artist styles and the overall generation capabilities. (c) Our method excels at preserving the unique styles of holdout artists, particularly when removing more than 500 styles. Moreover, the drop in the overall generation capabilities caused by EMCID is negligible even after erasing 1,000 styles.
  • ...and 14 more figures