LocateEdit-Bench: A Benchmark for Instruction-Based Editing Localization

Shiyu Wu; Shuyan Li; Jing Li; Jing Liu; Yequan Wang

LocateEdit-Bench: A Benchmark for Instruction-Based Editing Localization

Shiyu Wu, Shuyan Li, Jing Li, Jing Liu, Yequan Wang

TL;DR

This work tackles the challenge of localizing edits produced by modern instruction-based image editing models, which lack explicit edit masks and semantically camouflage changes. It introduces LocateEdit-Bench, a $231K$-sample dataset generated with $4$ editors across $3$ edit types, plus a multi-metric evaluation framework to assess localization methods. Through extensive benchmarking, the study reveals that state-of-the-art localization techniques struggle with instruction-based edits and fare poorly in cross-editor generalization, highlighting the need for editor-agnostic, semantically aware approaches. The dataset and findings provide a critical foundation for advancing robust forgery localization in the era of AI-powered image manipulation, with open-sourcing planned upon acceptance.

Abstract

Recent advancements in image editing have enabled highly controllable and semantically-aware alteration of visual content, posing unprecedented challenges to manipulation localization. However, existing AI-generated forgery localization methods primarily focus on inpainting-based manipulations, making them ineffective against the latest instruction-based editing paradigms. To bridge this critical gap, we propose LocateEdit-Bench, a large-scale dataset comprising $231$K edited images, designed specifically to benchmark localization methods against instruction-driven image editing. Our dataset incorporates four cutting-edge editing models and covers three common edit types. We conduct a detailed analysis of the dataset and develop two multi-metric evaluation protocols to assess existing localization methods. Our work establishes a foundation to keep pace with the evolving landscape of image editing, thereby facilitating the development of effective methods for future forgery localization. Dataset will be open-sourced upon acceptance.

LocateEdit-Bench: A Benchmark for Instruction-Based Editing Localization

TL;DR

-sample dataset generated with

editors across

edit types, plus a multi-metric evaluation framework to assess localization methods. Through extensive benchmarking, the study reveals that state-of-the-art localization techniques struggle with instruction-based edits and fare poorly in cross-editor generalization, highlighting the need for editor-agnostic, semantically aware approaches. The dataset and findings provide a critical foundation for advancing robust forgery localization in the era of AI-powered image manipulation, with open-sourcing planned upon acceptance.

Abstract

K edited images, designed specifically to benchmark localization methods against instruction-driven image editing. Our dataset incorporates four cutting-edge editing models and covers three common edit types. We conduct a detailed analysis of the dataset and develop two multi-metric evaluation protocols to assess existing localization methods. Our work establishes a foundation to keep pace with the evolving landscape of image editing, thereby facilitating the development of effective methods for future forgery localization. Dataset will be open-sourced upon acceptance.

Paper Structure (17 sections, 7 figures, 4 tables)

This paper contains 17 sections, 7 figures, 4 tables.

Introduction
Related Work
Image Editing
Image Manipulation Datasets
LocateEdit-Bench Dataset
Images and Instructions Collection
Image Editing
Mask Generation
Analyses of LocateEdit-Bench
Experiment
Evaluation Protocols
Metrics
Baselines
Full-Set Evaluation
Cross-Editor Generalization
...and 2 more sections

Figures (7)

Figure 1: Comparison of two datasets using different editing approaches. Localizing edits in instruction-based image editing is particularly difficult because the edits are semantically coherent and visually seamless.
Figure 2: Construction pipeline of LocateEdit-Bench. We carefully select suitable real-world images and editing prompts from a large-scale editing dataset. Then we employ four latest image editing models to generate $231$K high-quality edited images. Subsequently, precise masks are generated using a high-quality semantic segmentation model.
Figure 3: Samples of LocateEdit-Bench. LocateEdit-Bench constitutes a comprehensive benchmark for editing localization, featuring high-resolution images edited by four different models, and containing three edit types applied to targets of diverse sizes.
Figure 4: Categroy distribution of LocateEdit-Bench and word cloud of editing instructions.
Figure 5: Comparison of feature distributions in edited regions and background areas. Edited regions show significantly reduced colorfulness and brightness, yet exhibit slightly increased spatial information (SI).
...and 2 more figures

LocateEdit-Bench: A Benchmark for Instruction-Based Editing Localization

TL;DR

Abstract

LocateEdit-Bench: A Benchmark for Instruction-Based Editing Localization

Authors

TL;DR

Abstract

Table of Contents

Figures (7)