Table of Contents
Fetching ...

Exploring Multi-view Pixel Contrast for General and Robust Image Forgery Localization

Zijie Lou, Gang Cao, Kun Guo, Haochen Zhu, Lifang Yu

TL;DR

This work tackles image forgery localization by addressing the poorly structured feature space in pixel embeddings. It introduces Multi-view Pixel-wise Contrastive (MPC) learning, which pre-trains a high-resolution backbone using supervised contrastive loss from within-image, cross-scale, and cross-modality perspectives, followed by fine-tuning a localization head with cross-entropy. The approach yields a well-organized pixel feature space, improving intra-class compactness and inter-class separability, and demonstrates superior generalization across diverse datasets and robustness to complex post-processing, including online social network transformations. MPC achieves state-of-the-art or competitive performance with a lightweight model and shows strong qualitative results on traditional tampering and AI-generated manipulations. The method promises practical forensic utility in real-world scenarios where forgeries vary in scale and post-processing might be encountered.

Abstract

Image forgery localization, which aims to segment tampered regions in an image, is a fundamental yet challenging digital forensic task. While some deep learning-based forensic methods have achieved impressive results, they directly learn pixel-to-label mappings without fully exploiting the relationship between pixels in the feature space. To address such deficiency, we propose a Multi-view Pixel-wise Contrastive algorithm (MPC) for image forgery localization. Specifically, we first pre-train the backbone network with the supervised contrastive loss to model pixel relationships from the perspectives of within-image, cross-scale and cross-modality. That is aimed at increasing intra-class compactness and inter-class separability. Then the localization head is fine-tuned using the cross-entropy loss, resulting in a better pixel localizer. The MPC is trained on three different scale training datasets to make a comprehensive and fair comparison with existing image forgery localization algorithms. Extensive experiments on the small, medium and large scale training datasets show that the proposed MPC achieves higher generalization performance and robustness against post-processing than the state-of-the-arts. Code will be available at https://github.com/multimediaFor/MPC.

Exploring Multi-view Pixel Contrast for General and Robust Image Forgery Localization

TL;DR

This work tackles image forgery localization by addressing the poorly structured feature space in pixel embeddings. It introduces Multi-view Pixel-wise Contrastive (MPC) learning, which pre-trains a high-resolution backbone using supervised contrastive loss from within-image, cross-scale, and cross-modality perspectives, followed by fine-tuning a localization head with cross-entropy. The approach yields a well-organized pixel feature space, improving intra-class compactness and inter-class separability, and demonstrates superior generalization across diverse datasets and robustness to complex post-processing, including online social network transformations. MPC achieves state-of-the-art or competitive performance with a lightweight model and shows strong qualitative results on traditional tampering and AI-generated manipulations. The method promises practical forensic utility in real-world scenarios where forgeries vary in scale and post-processing might be encountered.

Abstract

Image forgery localization, which aims to segment tampered regions in an image, is a fundamental yet challenging digital forensic task. While some deep learning-based forensic methods have achieved impressive results, they directly learn pixel-to-label mappings without fully exploiting the relationship between pixels in the feature space. To address such deficiency, we propose a Multi-view Pixel-wise Contrastive algorithm (MPC) for image forgery localization. Specifically, we first pre-train the backbone network with the supervised contrastive loss to model pixel relationships from the perspectives of within-image, cross-scale and cross-modality. That is aimed at increasing intra-class compactness and inter-class separability. Then the localization head is fine-tuned using the cross-entropy loss, resulting in a better pixel localizer. The MPC is trained on three different scale training datasets to make a comprehensive and fair comparison with existing image forgery localization algorithms. Extensive experiments on the small, medium and large scale training datasets show that the proposed MPC achieves higher generalization performance and robustness against post-processing than the state-of-the-arts. Code will be available at https://github.com/multimediaFor/MPC.
Paper Structure (15 sections, 6 equations, 5 figures, 8 tables)

This paper contains 15 sections, 6 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Detailed illustration of proposed image forgery localization network MPC.
  • Figure 2: Robustness against different post-processing manipulations on the Columbia dataset.
  • Figure 3: Visualization of robustness against combined post-processing manipulations on the DSO dataset.
  • Figure 4: Qualitative comparison of forgery localization on some representative testing images. From left to right: four splicing images, four copy-move images, and two removal images. From top to bottom: tampered image, ground truth (GT), and the localization results from CAT-Net, TruFor and our MPC.
  • Figure 5: Qualitative comparison of forgery localization on some novel tampered images. From left to right: three deepfake images and three local AI-generated images. From top to bottom: tampered image, ground truth (GT), and the localization results from CAT-Net, TruFor and our MPC.