Table of Contents
Fetching ...

Can Image Splicing and Copy-Move Forgery Be Detected by the Same Model? Forensim: An Attention-Based State-Space Approach

Soumyaroop Nandi, Prem Natarajan

TL;DR

Forensim addresses the challenge of detecting and localizing both sources and forged regions in image manipulation, covering both splicing and copy-move forgeries. It introduces a unified three-class supervision framework built on similarity and manipulation state-space attention within a vision-transformer backbone, producing source and target masks along with a detection signal. A key contribution is CMFD_Anything, a high-quality, high-resolution dataset designed to train robust copy-move detectors in the absence of comprehensive public data. Experiments across CMFD and IMDL benchmarks demonstrate state-of-the-art performance, strong generalization, and robustness to common perturbations, establishing a practical, interpretable solution for image forensics.

Abstract

We introduce Forensim, an attention-based state-space framework for image forgery detection that jointly localizes both manipulated (target) and source regions. Unlike traditional approaches that rely solely on artifact cues to detect spliced or forged areas, Forensim is designed to capture duplication patterns crucial for understanding context. In scenarios such as protest imagery, detecting only the forged region, for example a duplicated act of violence inserted into a peaceful crowd, can mislead interpretation, highlighting the need for joint source-target localization. Forensim outputs three-class masks (pristine, source, target) and supports detection of both splicing and copy-move forgeries within a unified architecture. We propose a visual state-space model that leverages normalized attention maps to identify internal similarities, paired with a region-based block attention module to distinguish manipulated regions. This design enables end-to-end training and precise localization. Forensim achieves state-of-the-art performance on standard benchmarks. We also release CMFD-Anything, a new dataset addressing limitations of existing copy-move forgery datasets.

Can Image Splicing and Copy-Move Forgery Be Detected by the Same Model? Forensim: An Attention-Based State-Space Approach

TL;DR

Forensim addresses the challenge of detecting and localizing both sources and forged regions in image manipulation, covering both splicing and copy-move forgeries. It introduces a unified three-class supervision framework built on similarity and manipulation state-space attention within a vision-transformer backbone, producing source and target masks along with a detection signal. A key contribution is CMFD_Anything, a high-quality, high-resolution dataset designed to train robust copy-move detectors in the absence of comprehensive public data. Experiments across CMFD and IMDL benchmarks demonstrate state-of-the-art performance, strong generalization, and robustness to common perturbations, establishing a practical, interpretable solution for image forensics.

Abstract

We introduce Forensim, an attention-based state-space framework for image forgery detection that jointly localizes both manipulated (target) and source regions. Unlike traditional approaches that rely solely on artifact cues to detect spliced or forged areas, Forensim is designed to capture duplication patterns crucial for understanding context. In scenarios such as protest imagery, detecting only the forged region, for example a duplicated act of violence inserted into a peaceful crowd, can mislead interpretation, highlighting the need for joint source-target localization. Forensim outputs three-class masks (pristine, source, target) and supports detection of both splicing and copy-move forgeries within a unified architecture. We propose a visual state-space model that leverages normalized attention maps to identify internal similarities, paired with a region-based block attention module to distinguish manipulated regions. This design enables end-to-end training and precise localization. Forensim achieves state-of-the-art performance on standard benchmarks. We also release CMFD-Anything, a new dataset addressing limitations of existing copy-move forgery datasets.
Paper Structure (23 sections, 21 equations, 9 figures, 7 tables)

This paper contains 23 sections, 21 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Proposed CMFD_Anything samples- Rows show: (a) forged image, (b) RGB mask, (c) forged image, (d) RGB mask. Source-target mask encodes untampered, source, & target regions.
  • Figure 2: Forensim Overview: Sim-Mani Attention and Fusion
  • Figure 3: Similarity State Space Attention Module
  • Figure 3: Pixel and Image-level on CASIA and CoMoFoD CMFD datasets. Bold = Best, Underline = Second-Best.
  • Figure 4: Manipulation State Space Attention Module
  • ...and 4 more figures