Table of Contents
Fetching ...

SAFIRE: Segment Any Forged Image Region

Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam, Minji Son, Changick Kim

TL;DR

SAFIRE addresses image forgery localization by partitioning an image into its originating source regions rather than predicting a single binary forgery mask. It introduces region-to-region contrastive pretraining and a point-prompted training regime, enabling stable, multi-source source-region segmentation that can be inferred from a grid of prompts. The approach achieves state-of-the-art performance on binary IFL benchmarks and demonstrates the ability to partition images into multiple sources using the SafireMS dataset, enhancing provenance-aware forgery analysis. This work offers a scalable, promptable framework for understanding complex, real-world forgeries and lays groundwork for provenance-driven visual forensics.

Abstract

Most techniques approach the problem of image forgery localization as a binary segmentation task, training neural networks to label original areas as 0 and forged areas as 1. In contrast, we tackle this issue from a more fundamental perspective by partitioning images according to their originating sources. To this end, we propose Segment Any Forged Image Region (SAFIRE), which solves forgery localization using point prompting. Each point on an image is used to segment the source region containing itself. This allows us to partition images into multiple source regions, a capability achieved for the first time. Additionally, rather than memorizing certain forgery traces, SAFIRE naturally focuses on uniform characteristics within each source region. This approach leads to more stable and effective learning, achieving superior performance in both the new task and the traditional binary forgery localization.

SAFIRE: Segment Any Forged Image Region

TL;DR

SAFIRE addresses image forgery localization by partitioning an image into its originating source regions rather than predicting a single binary forgery mask. It introduces region-to-region contrastive pretraining and a point-prompted training regime, enabling stable, multi-source source-region segmentation that can be inferred from a grid of prompts. The approach achieves state-of-the-art performance on binary IFL benchmarks and demonstrates the ability to partition images into multiple sources using the SafireMS dataset, enhancing provenance-aware forgery analysis. This work offers a scalable, promptable framework for understanding complex, real-world forgeries and lays groundwork for provenance-driven visual forensics.

Abstract

Most techniques approach the problem of image forgery localization as a binary segmentation task, training neural networks to label original areas as 0 and forged areas as 1. In contrast, we tackle this issue from a more fundamental perspective by partitioning images according to their originating sources. To this end, we propose Segment Any Forged Image Region (SAFIRE), which solves forgery localization using point prompting. Each point on an image is used to segment the source region containing itself. This allows us to partition images into multiple source regions, a capability achieved for the first time. Additionally, rather than memorizing certain forgery traces, SAFIRE naturally focuses on uniform characteristics within each source region. This approach leads to more stable and effective learning, achieving superior performance in both the new task and the traditional binary forgery localization.

Paper Structure

This paper contains 39 sections, 13 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: The forged image is composed of three source regions. Previous methods are limited to binary prediction --- segmenting forged regions. In contrast, our SAFIRE is also capable of multi-source prediction --- distinguishing regions that originate from the same source images.
  • Figure 2: Overview of how SAFIRE conducts IFL. An image along with point prompts are input into the model. The model segments the source region containing each point, and these results are combined to produce the final output
  • Figure 3: Pretraining. Features originating from the same source region become closer in the feature space, while those from different source regions move apart, enabling the image encoder to learn information that distinguishes source regions.
  • Figure 4: Training. The adapter and mask decoder are trained to segment the source region that includes the given point effectively. Furthermore, it is trained to output a confidence score of this prediction map for inference purposes.
  • Figure 5: Inference. Multiple points in a grid pattern are input, and a prediction map is obtained for each point. Clustering is performed using the corresponding representative features, and the final prediction is produced.
  • ...and 9 more figures