Table of Contents
Fetching ...

Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features

Ziyong Wang, Charith Abhayaratne

TL;DR

The paper tackles localizing manipulated image regions without pixel-level annotations by fusing weak image-level signals with segmentation priors. It introduces a four-step pipeline based on WCBnet and Cross-block Attention Module to produce multi-view activation maps, which are then refined with pre-trained segmentation masks from DeepLab, SAM, and PSPNet using Bayesian inference. Empirical results on CASIA2.0 show improved pixel-wise localization over the backbone and competitive performance against some fully supervised methods, with analysis of segmentation model strengths. The work demonstrates the feasibility and practical value of weakly supervised manipulation localization, offering a pathway toward interpretable and region-specific detection without demanding dense annotations.

Abstract

The explosive growth of digital images and the widespread availability of image editing tools have made image manipulation detection an increasingly critical challenge. Current deep learning-based manipulation detection methods excel in achieving high image-level classification accuracy, they often fall short in terms of interpretability and localization of manipulated regions. Additionally, the absence of pixel-wise annotations in real-world scenarios limits the existing fully-supervised manipulation localization techniques. To address these challenges, we propose a novel weakly-supervised approach that integrates activation maps generated by image-level manipulation detection networks with segmentation maps from pre-trained models. Specifically, we build on our previous image-level work named WCBnet to produce multi-view feature maps which are subsequently fused for coarse localization. These coarse maps are then refined using detailed segmented regional information provided by pre-trained segmentation models (such as DeepLab, SegmentAnything and PSPnet), with Bayesian inference employed to enhance the manipulation localization. Experimental results demonstrate the effectiveness of our approach, highlighting the feasibility to localize image manipulations without relying on pixel-level labels.

Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features

TL;DR

The paper tackles localizing manipulated image regions without pixel-level annotations by fusing weak image-level signals with segmentation priors. It introduces a four-step pipeline based on WCBnet and Cross-block Attention Module to produce multi-view activation maps, which are then refined with pre-trained segmentation masks from DeepLab, SAM, and PSPNet using Bayesian inference. Empirical results on CASIA2.0 show improved pixel-wise localization over the backbone and competitive performance against some fully supervised methods, with analysis of segmentation model strengths. The work demonstrates the feasibility and practical value of weakly supervised manipulation localization, offering a pathway toward interpretable and region-specific detection without demanding dense annotations.

Abstract

The explosive growth of digital images and the widespread availability of image editing tools have made image manipulation detection an increasingly critical challenge. Current deep learning-based manipulation detection methods excel in achieving high image-level classification accuracy, they often fall short in terms of interpretability and localization of manipulated regions. Additionally, the absence of pixel-wise annotations in real-world scenarios limits the existing fully-supervised manipulation localization techniques. To address these challenges, we propose a novel weakly-supervised approach that integrates activation maps generated by image-level manipulation detection networks with segmentation maps from pre-trained models. Specifically, we build on our previous image-level work named WCBnet to produce multi-view feature maps which are subsequently fused for coarse localization. These coarse maps are then refined using detailed segmented regional information provided by pre-trained segmentation models (such as DeepLab, SegmentAnything and PSPnet), with Bayesian inference employed to enhance the manipulation localization. Experimental results demonstrate the effectiveness of our approach, highlighting the feasibility to localize image manipulations without relying on pixel-level labels.

Paper Structure

This paper contains 13 sections, 7 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The work-flow of proposed weakly-supervised manipulation localization model; Each step is labeled and presented in different colors.
  • Figure 2: The features maps of the backbone ResNet50 and WCBnet; The $\text{WCBnet}_i$ means the CBAM with shape of block i , while $\text{WCBnet}_m$ is the their geometric mean; Ref label is the pixel-wise labels just for reference.
  • Figure 3: The manipulated images and their corresponding image segmentation maps, generated by three state-of-art pre-trained methods.
  • Figure 4: Enhanced heatmaps of several manipulated images,combining activation maps of different extractors and different segmentation maps; The subscript of the model name refers to the associated segmentation model, d for DeepLab, s for SAM, and p for PSPNet.