Table of Contents
Fetching ...

PromptForge-350k: A Large-Scale Dataset and Contrastive Framework for Prompt-Based AI Image Forgery Localization

Jianpeng Wang, Haoyu Wang, Baoying Chen, Jishen Zeng, Yiming Qin, Yiqi Yang, Zhongjie Ba

Abstract

The rapid democratization of prompt-based AI image editing has recently exacerbated the risks associated with malicious content fabrication and misinformation. However, forgery localization methods targeting these emerging editing techniques remain significantly under-explored. To bridge this gap, we first introduce a fully automated mask annotating framework that leverages keypoint alignment and semantic space similarity to generate precise ground-truth masks for edited regions. Based on this framework, we construct PromptForge-350k, a large-scale forgery localization dataset covering four state-of-the-art prompt-based AI image editing models, thereby mitigating the data scarcity in this domain. Furthermore, we propose ICL-Net, an effective forgery localization network featuring a triple-stream backbone and intra-image contrastive learning. This design enables the model to capture highly robust and generalizable forensic features. Extensive experiments demonstrate that our method achieves an IoU of 62.5% on PromptForge-350k, outperforming SOTA methods by 5.1%. Additionally, it exhibits strong robustness against common degradations with an IoU drop of less than 1%, and shows promising generalization capabilities on unseen editing models, achieving an average IoU of 41.5%.

PromptForge-350k: A Large-Scale Dataset and Contrastive Framework for Prompt-Based AI Image Forgery Localization

Abstract

The rapid democratization of prompt-based AI image editing has recently exacerbated the risks associated with malicious content fabrication and misinformation. However, forgery localization methods targeting these emerging editing techniques remain significantly under-explored. To bridge this gap, we first introduce a fully automated mask annotating framework that leverages keypoint alignment and semantic space similarity to generate precise ground-truth masks for edited regions. Based on this framework, we construct PromptForge-350k, a large-scale forgery localization dataset covering four state-of-the-art prompt-based AI image editing models, thereby mitigating the data scarcity in this domain. Furthermore, we propose ICL-Net, an effective forgery localization network featuring a triple-stream backbone and intra-image contrastive learning. This design enables the model to capture highly robust and generalizable forensic features. Extensive experiments demonstrate that our method achieves an IoU of 62.5% on PromptForge-350k, outperforming SOTA methods by 5.1%. Additionally, it exhibits strong robustness against common degradations with an IoU drop of less than 1%, and shows promising generalization capabilities on unseen editing models, achieving an average IoU of 41.5%.

Paper Structure

This paper contains 25 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of our work: (1) A fully automated mask annotating framework targeting prompt-based AI image editing. (2) PromptForge-350k, a comprehensive forgery localization dataset. (3) ICL-Net, an effective forgery localization network.
  • Figure 2: Pipeline of Pixel-Level Alignment: The process involves keypoint extraction and matching, followed by affine matrix estimation. Finally, we apply a coordinate transformation, and the black borders are cropped out.
  • Figure 3: Pipeline of Semantic-Based Mask Annotation. We first utilize DINO v3 to extract semantic features, followed by calculating the pixel-wise feature similarity. Finally, the similarity map is binarized to output the edited region mask.
  • Figure 4: Statistics of the proposed dataset. Left: Distribution of editing task categories. Right: Time consumption breakdown of each operation during the dataset construction process.
  • Figure 5: Architecture of the proposed forgery localization network, ICL-Net. The network features three parallel backbones and is optimized via intra-image contrastive loss and segmentation loss.
  • ...and 1 more figures