Table of Contents
Fetching ...

LocateEdit-Bench: A Benchmark for Instruction-Based Editing Localization

Shiyu Wu, Shuyan Li, Jing Li, Jing Liu, Yequan Wang

TL;DR

This work tackles the challenge of localizing edits produced by modern instruction-based image editing models, which lack explicit edit masks and semantically camouflage changes. It introduces LocateEdit-Bench, a $231K$-sample dataset generated with $4$ editors across $3$ edit types, plus a multi-metric evaluation framework to assess localization methods. Through extensive benchmarking, the study reveals that state-of-the-art localization techniques struggle with instruction-based edits and fare poorly in cross-editor generalization, highlighting the need for editor-agnostic, semantically aware approaches. The dataset and findings provide a critical foundation for advancing robust forgery localization in the era of AI-powered image manipulation, with open-sourcing planned upon acceptance.

Abstract

Recent advancements in image editing have enabled highly controllable and semantically-aware alteration of visual content, posing unprecedented challenges to manipulation localization. However, existing AI-generated forgery localization methods primarily focus on inpainting-based manipulations, making them ineffective against the latest instruction-based editing paradigms. To bridge this critical gap, we propose LocateEdit-Bench, a large-scale dataset comprising $231$K edited images, designed specifically to benchmark localization methods against instruction-driven image editing. Our dataset incorporates four cutting-edge editing models and covers three common edit types. We conduct a detailed analysis of the dataset and develop two multi-metric evaluation protocols to assess existing localization methods. Our work establishes a foundation to keep pace with the evolving landscape of image editing, thereby facilitating the development of effective methods for future forgery localization. Dataset will be open-sourced upon acceptance.

LocateEdit-Bench: A Benchmark for Instruction-Based Editing Localization

TL;DR

This work tackles the challenge of localizing edits produced by modern instruction-based image editing models, which lack explicit edit masks and semantically camouflage changes. It introduces LocateEdit-Bench, a -sample dataset generated with editors across edit types, plus a multi-metric evaluation framework to assess localization methods. Through extensive benchmarking, the study reveals that state-of-the-art localization techniques struggle with instruction-based edits and fare poorly in cross-editor generalization, highlighting the need for editor-agnostic, semantically aware approaches. The dataset and findings provide a critical foundation for advancing robust forgery localization in the era of AI-powered image manipulation, with open-sourcing planned upon acceptance.

Abstract

Recent advancements in image editing have enabled highly controllable and semantically-aware alteration of visual content, posing unprecedented challenges to manipulation localization. However, existing AI-generated forgery localization methods primarily focus on inpainting-based manipulations, making them ineffective against the latest instruction-based editing paradigms. To bridge this critical gap, we propose LocateEdit-Bench, a large-scale dataset comprising K edited images, designed specifically to benchmark localization methods against instruction-driven image editing. Our dataset incorporates four cutting-edge editing models and covers three common edit types. We conduct a detailed analysis of the dataset and develop two multi-metric evaluation protocols to assess existing localization methods. Our work establishes a foundation to keep pace with the evolving landscape of image editing, thereby facilitating the development of effective methods for future forgery localization. Dataset will be open-sourced upon acceptance.
Paper Structure (17 sections, 7 figures, 4 tables)

This paper contains 17 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Comparison of two datasets using different editing approaches. Localizing edits in instruction-based image editing is particularly difficult because the edits are semantically coherent and visually seamless.
  • Figure 2: Construction pipeline of LocateEdit-Bench. We carefully select suitable real-world images and editing prompts from a large-scale editing dataset. Then we employ four latest image editing models to generate $231$K high-quality edited images. Subsequently, precise masks are generated using a high-quality semantic segmentation model.
  • Figure 3: Samples of LocateEdit-Bench. LocateEdit-Bench constitutes a comprehensive benchmark for editing localization, featuring high-resolution images edited by four different models, and containing three edit types applied to targets of diverse sizes.
  • Figure 4: Categroy distribution of LocateEdit-Bench and word cloud of editing instructions.
  • Figure 5: Comparison of feature distributions in edited regions and background areas. Edited regions show significantly reduced colorfulness and brightness, yet exhibit slightly increased spatial information (SI).
  • ...and 2 more figures