IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment

Shangkun Sun; Bowen Qu; Xiaoyu Liang; Songlin Fan; Wei Gao

IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment

Shangkun Sun, Bowen Qu, Xiaoyu Liang, Songlin Fan, Wei Gao

TL;DR

IE-Bench tackles the lack of human-aligned evaluation for text-driven image editing by introducing IE-DB, a MOS-annotated dataset of 301 source images, editing prompts, and results across multiple editing methods, accompanied by IE-QA, a multi-modal quality assessment model. IE-QA explicitly models image-text alignment, the source-target relationship, and visual quality, using a CLIP-based text-visual branch, a source-target feature fusion, and an IP-IQA-inspired quality branch, trained with the loss $L = L_{plcc} + \alpha \cdot L_{rank}$ where $\alpha = 0.3$. Across 10-fold cross-validation, IE-QA demonstrates superior correlation with human judgments compared to diverse baselines, achieving SROCC up to $0.752$, PLCC $0.750$, KRCC $0.554$, and RMSE $1.045$, highlighting the importance of incorporating source information and text guidance in IQA for editing. The work provides a practical, publicly released benchmark and model to drive fairer, perception-aligned evaluation of text-driven image editing systems.

Abstract

Recent advances in text-driven image editing have been significant, yet the task of accurately evaluating these edited images continues to pose a considerable challenge. Different from the assessment of text-driven image generation, text-driven image editing is characterized by simultaneously conditioning on both text and a source image. The edited images often retain an intrinsic connection to the original image, which dynamically change with the semantics of the text. However, previous methods tend to solely focus on text-image alignment or have not aligned with human perception. In this work, we introduce the Text-driven Image Editing Benchmark suite (IE-Bench) to enhance the assessment of text-driven edited images. IE-Bench includes a database contains diverse source images, various editing prompts and the corresponding results different editing methods, and total 3,010 Mean Opinion Scores (MOS) provided by 25 human subjects. Furthermore, we introduce IE-QA, a multi-modality source-aware quality assessment method for text-driven image editing. To the best of our knowledge, IE-Bench offers the first IQA dataset and model tailored for text-driven image editing. Extensive experiments demonstrate IE-QA's superior subjective-alignments on the text-driven image editing task compared with previous metrics. We will make all related data and code available to the public.

IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment

TL;DR

where

. Across 10-fold cross-validation, IE-QA demonstrates superior correlation with human judgments compared to diverse baselines, achieving SROCC up to

, PLCC

, KRCC

, and RMSE

, highlighting the importance of incorporating source information and text guidance in IQA for editing. The work provides a practical, publicly released benchmark and model to drive fairer, perception-aligned evaluation of text-driven image editing systems.

Abstract

Paper Structure (25 sections, 4 equations, 6 figures, 3 tables)

This paper contains 25 sections, 4 equations, 6 figures, 3 tables.

Introduction
Related Work
Image Quality Assessment
Metrics for Image Editing
Datasets for Image Editing
Methods for Image Editing
Text-driven Image Editing Database
Source Image Collection
Prompt Selection
Image Editing
Subjective Study
Statistcs Analysis
Method
Text-driven Image Editing Quality Assessment
image-Text Alignment
...and 10 more sections

Figures (6)

Figure 1: Overview of the proposed IE-Bench.
Figure 2: Collection of source images. (a) Sources of images. (b) Categories of images. (c) Content of images. (d) (f) denotes fine-grained classification of human images, including classification of camera, pose, and actions, respectively.
Figure 3: Statistics of IE-DB prompts. (a) Word cloud of IE-Bench DB prompts. (b) Proportion of different types
Figure 4: Z-score MOS distributions of different editing methods.
Figure 5: Network architecture of IE-QA.
...and 1 more figures

IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment

TL;DR

Abstract

IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (6)