Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion

Hengrui Gu; Kaixiong Zhou; Yili Wang; Ruobing Wang; Xin Wang

Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion

Hengrui Gu, Kaixiong Zhou, Yili Wang, Ruobing Wang, Xin Wang

TL;DR

A straightforward implementation of MPE (Based on in-context learning) exhibits better overall performance than previous model editors, and it is hoped these efforts can further promote faithful evaluation of T2I knowledge editing methods.

Abstract

During pre-training, the Text-to-Image (T2I) diffusion models encode factual knowledge into their parameters. These parameterized facts enable realistic image generation, but they may become obsolete over time, thereby misrepresenting the current state of the world. Knowledge editing techniques aim to update model knowledge in a targeted way. However, facing the dual challenges posed by inadequate editing datasets and unreliable evaluation criterion, the development of T2I knowledge editing encounter difficulties in effectively generalizing injected knowledge. In this work, we design a T2I knowledge editing framework by comprehensively spanning on three phases: First, we curate a dataset \textbf{CAKE}, comprising paraphrase and multi-object test, to enable more fine-grained assessment on knowledge generalization. Second, we propose a novel criterion, \textbf{adaptive CLIP threshold}, to effectively filter out false successful images under the current criterion and achieve reliable editing evaluation. Finally, we introduce \textbf{MPE}, a simple but effective approach for T2I knowledge editing. Instead of tuning parameters, MPE precisely recognizes and edits the outdated part of the conditioning text-prompt to accommodate the up-to-date knowledge. A straightforward implementation of MPE (Based on in-context learning) exhibits better overall performance than previous model editors. We hope these efforts can further promote faithful evaluation of T2I knowledge editing methods.

Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion

TL;DR

Abstract

Paper Structure (20 sections, 4 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 20 sections, 4 equations, 11 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Text-to-image Knowledge Editing
Preliminaries
Counterfactual Assessment of Text-to-image Knowledge Editing
Adaptive CLIP Threshold Criterion
MPE: A Proposal for Text-to-Image Knowledge Editing
Experiments
Experimental Setup
Single Editing Results
Multiple Editing Results
Performance Analysis on the Retriever Component
Time Overhead Analysis of the Adaptive CLIP Threshold
Conclusion
Statistics and Construction Details of CAKE
...and 5 more sections

Figures (11)

Figure 1: Illustrating the challenges in T2I knowledge editing, the timeline in this figure shows the order in which these images were generated: (a) Existing editing approaches often fail on paraphrases of edit prompt, such as "the American president". We term this situation Paraphrase Generalization Failure. (b) The edited model struggles to deal with inputs involved with multiple edited knowledge. We refer to this case as Compositionality Generalization Failure.
Figure 2: An editing evaluation example ($p_{\mathrm{edit}}=$"the U.S. president", $p_{\mathrm{tar}}=$"Joe Biden"). A closer distance between two embedding points implies higher similarity, i.e. CLIP-Score. The images with borders are false successful images under the current criterion. For each evaluation prompt, the adaptive CLIP threshold precisely approximates the ideal decision boundary and effectively filters out the false successful images.
Figure 3: Using Qwen-vl-max as the pseudo-label generator, the Macro-F1 performance across different criterion / threshold operators. Current refers to the current, classification-based criterion.
Figure 4: The basic workflow of MPE.
Figure 5: The qualitative examples from the CAKE dataset. The (# num) refers to the size of edit batch.
...and 6 more figures

Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion

TL;DR

Abstract

Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion

Authors

TL;DR

Abstract

Table of Contents

Figures (11)