Table of Contents
Fetching ...

Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion

Hengrui Gu, Kaixiong Zhou, Yili Wang, Ruobing Wang, Xin Wang

TL;DR

A straightforward implementation of MPE (Based on in-context learning) exhibits better overall performance than previous model editors, and it is hoped these efforts can further promote faithful evaluation of T2I knowledge editing methods.

Abstract

During pre-training, the Text-to-Image (T2I) diffusion models encode factual knowledge into their parameters. These parameterized facts enable realistic image generation, but they may become obsolete over time, thereby misrepresenting the current state of the world. Knowledge editing techniques aim to update model knowledge in a targeted way. However, facing the dual challenges posed by inadequate editing datasets and unreliable evaluation criterion, the development of T2I knowledge editing encounter difficulties in effectively generalizing injected knowledge. In this work, we design a T2I knowledge editing framework by comprehensively spanning on three phases: First, we curate a dataset \textbf{CAKE}, comprising paraphrase and multi-object test, to enable more fine-grained assessment on knowledge generalization. Second, we propose a novel criterion, \textbf{adaptive CLIP threshold}, to effectively filter out false successful images under the current criterion and achieve reliable editing evaluation. Finally, we introduce \textbf{MPE}, a simple but effective approach for T2I knowledge editing. Instead of tuning parameters, MPE precisely recognizes and edits the outdated part of the conditioning text-prompt to accommodate the up-to-date knowledge. A straightforward implementation of MPE (Based on in-context learning) exhibits better overall performance than previous model editors. We hope these efforts can further promote faithful evaluation of T2I knowledge editing methods.

Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion

TL;DR

A straightforward implementation of MPE (Based on in-context learning) exhibits better overall performance than previous model editors, and it is hoped these efforts can further promote faithful evaluation of T2I knowledge editing methods.

Abstract

During pre-training, the Text-to-Image (T2I) diffusion models encode factual knowledge into their parameters. These parameterized facts enable realistic image generation, but they may become obsolete over time, thereby misrepresenting the current state of the world. Knowledge editing techniques aim to update model knowledge in a targeted way. However, facing the dual challenges posed by inadequate editing datasets and unreliable evaluation criterion, the development of T2I knowledge editing encounter difficulties in effectively generalizing injected knowledge. In this work, we design a T2I knowledge editing framework by comprehensively spanning on three phases: First, we curate a dataset \textbf{CAKE}, comprising paraphrase and multi-object test, to enable more fine-grained assessment on knowledge generalization. Second, we propose a novel criterion, \textbf{adaptive CLIP threshold}, to effectively filter out false successful images under the current criterion and achieve reliable editing evaluation. Finally, we introduce \textbf{MPE}, a simple but effective approach for T2I knowledge editing. Instead of tuning parameters, MPE precisely recognizes and edits the outdated part of the conditioning text-prompt to accommodate the up-to-date knowledge. A straightforward implementation of MPE (Based on in-context learning) exhibits better overall performance than previous model editors. We hope these efforts can further promote faithful evaluation of T2I knowledge editing methods.
Paper Structure (20 sections, 4 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 20 sections, 4 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustrating the challenges in T2I knowledge editing, the timeline in this figure shows the order in which these images were generated: (a) Existing editing approaches often fail on paraphrases of edit prompt, such as "the American president". We term this situation Paraphrase Generalization Failure. (b) The edited model struggles to deal with inputs involved with multiple edited knowledge. We refer to this case as Compositionality Generalization Failure.
  • Figure 2: An editing evaluation example ($p_{\mathrm{edit}}=$"the U.S. president", $p_{\mathrm{tar}}=$"Joe Biden"). A closer distance between two embedding points implies higher similarity, i.e. CLIP-Score. The images with borders are false successful images under the current criterion. For each evaluation prompt, the adaptive CLIP threshold precisely approximates the ideal decision boundary and effectively filters out the false successful images.
  • Figure 3: Using Qwen-vl-max as the pseudo-label generator, the Macro-F1 performance across different criterion / threshold operators. Current refers to the current, classification-based criterion.
  • Figure 4: The basic workflow of MPE.
  • Figure 5: The qualitative examples from the CAKE dataset. The (# num) refers to the size of edit batch.
  • ...and 6 more figures