Table of Contents
Fetching ...

MSN: Multi-directional Similarity Network for Hand-crafted and Deep-synthesized Copy-Move Forgery Detection

Liangwei Jiang, Jinluo Xie, Yecheng Huang, Hua Zhang, Hongyu Yang, Di Huang

TL;DR

This paper tackles the growing challenge of copy-move forgery detection under rotations, scaling, and deep-synthesized tampering. It introduces the Multi-directional Similarity Network (MSN), a two-stream framework that combines a multi-directional, multi-scale representation with a 2-D similarity matrix decoder to enhance region localization. MSN achieves state-of-the-art results on classic benchmarks CASIA CMFD and CoMoFoD and demonstrates strong robustness on the newly proposed deep-synthesized forgery dataset (DCF), including improvements from synthetic-data fine-tuning. The work also provides extensive ablation analyses and a fast inference time, underscoring both effectiveness and practicality for real-world CMFD tasks in the era of deepfake-like manipulations.

Abstract

Copy-move image forgery aims to duplicate certain objects or to hide specific contents with copy-move operations, which can be achieved by a sequence of manual manipulations as well as up-to-date deep generative network-based swapping. Its detection is becoming increasingly challenging for the complex transformations and fine-tuned operations on the tampered regions. In this paper, we propose a novel two-stream model, namely Multi-directional Similarity Network (MSN), to accurate and efficient copy-move forgery detection. It addresses the two major limitations of existing deep detection models in \textbf{representation} and \textbf{localization}, respectively. In representation, an image is hierarchically encoded by a multi-directional CNN network, and due to the diverse augmentation in scales and rotations, the feature achieved better measures the similarity between sampled patches in two streams. In localization, we design a 2-D similarity matrix based decoder, and compared with the current 1-D similarity vector based one, it makes full use of spatial information in the entire image, leading to the improvement in detecting tampered regions. Beyond the method, a new forgery database generated by various deep neural networks is presented, as a new benchmark for detecting the growing deep-synthesized copy-move. Extensive experiments are conducted on two classic image forensics benchmarks, \emph{i.e.} CASIA CMFD and CoMoFoD, and the newly presented one. The state-of-the-art results are reported, which demonstrate the effectiveness of the proposed approach.

MSN: Multi-directional Similarity Network for Hand-crafted and Deep-synthesized Copy-Move Forgery Detection

TL;DR

This paper tackles the growing challenge of copy-move forgery detection under rotations, scaling, and deep-synthesized tampering. It introduces the Multi-directional Similarity Network (MSN), a two-stream framework that combines a multi-directional, multi-scale representation with a 2-D similarity matrix decoder to enhance region localization. MSN achieves state-of-the-art results on classic benchmarks CASIA CMFD and CoMoFoD and demonstrates strong robustness on the newly proposed deep-synthesized forgery dataset (DCF), including improvements from synthetic-data fine-tuning. The work also provides extensive ablation analyses and a fast inference time, underscoring both effectiveness and practicality for real-world CMFD tasks in the era of deepfake-like manipulations.

Abstract

Copy-move image forgery aims to duplicate certain objects or to hide specific contents with copy-move operations, which can be achieved by a sequence of manual manipulations as well as up-to-date deep generative network-based swapping. Its detection is becoming increasingly challenging for the complex transformations and fine-tuned operations on the tampered regions. In this paper, we propose a novel two-stream model, namely Multi-directional Similarity Network (MSN), to accurate and efficient copy-move forgery detection. It addresses the two major limitations of existing deep detection models in \textbf{representation} and \textbf{localization}, respectively. In representation, an image is hierarchically encoded by a multi-directional CNN network, and due to the diverse augmentation in scales and rotations, the feature achieved better measures the similarity between sampled patches in two streams. In localization, we design a 2-D similarity matrix based decoder, and compared with the current 1-D similarity vector based one, it makes full use of spatial information in the entire image, leading to the improvement in detecting tampered regions. Beyond the method, a new forgery database generated by various deep neural networks is presented, as a new benchmark for detecting the growing deep-synthesized copy-move. Extensive experiments are conducted on two classic image forensics benchmarks, \emph{i.e.} CASIA CMFD and CoMoFoD, and the newly presented one. The state-of-the-art results are reported, which demonstrate the effectiveness of the proposed approach.

Paper Structure

This paper contains 17 sections, 1 equation, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Two copy-move forgery examples achieved by manual manipulation (upper row) and deep generative network (bottom row). From left to right: original images, forged images, and tampered regions.
  • Figure 2: Framework overview. Given a query image, a multi-directional image set and a zoomed-in patch set are constructed by rotating it in four pre-defined quantized orientations and slicing out four patches from the enlarged image, respectively. The image or patch features are extracted through CNNs with shared weights, and then eight feature pairs are developed by fusing features from the basic input image and the augmented ones. They are individually fed into the two-stream detector to predict copy-move tampered region candidates, where a 2-D similarity matrix based decoder is designed for more accurate localization. The outputs of the eight duplicated detectors are adopted to jointly render a mask to localize the final tampered regions. Best viewed in color.
  • Figure 3: Illustration of challenges caused by rotating. S and T represent the source and target regions, respectively. The same color indicates a high visual similarity.
  • Figure 4: Comparison of similarity vector and similarity map from tampered and pristine regions. Hot color indicates high similarity.
  • Figure 5: Detailed architecture of the Similarity Map Classifier, which predicts the 2D tampered region mask.
  • ...and 5 more figures