Pixel-Inconsistency Modeling for Image Manipulation Localization

Chenqi Kong; Anwei Luo; Shiqi Wang; Haoliang Li; Anderson Rocha; Alex C. Kot

Pixel-Inconsistency Modeling for Image Manipulation Localization

Chenqi Kong, Anwei Luo, Shiqi Wang, Haoliang Li, Anderson Rocha, Alex C. Kot

TL;DR

This work establishes a comprehensive benchmark integrating 16 representative detection models across 12 datasets and proposes a novel Pixel-Inconsistency Data Augmentation strategy, driving the model to focus on capturing inherent pixel-level artifacts instead of mining semantic forgery traces.

Abstract

Digital image forensics plays a crucial role in image authentication and manipulation localization. Despite the progress powered by deep neural networks, existing forgery localization methodologies exhibit limitations when deployed to unseen datasets and perturbed images (i.e., lack of generalization and robustness to real-world applications). To circumvent these problems and aid image integrity, this paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. The rationale is grounded on the observation that most image signal processors (ISP) involve the demosaicing process, which introduces pixel correlations in pristine images. Moreover, manipulating operations, including splicing, copy-move, and inpainting, directly affect such pixel regularity. We, therefore, first split the input image into several blocks and design masked self-attention mechanisms to model the global pixel dependency in input images. Simultaneously, we optimize another local pixel dependency stream to mine local manipulation clues within input forgery images. In addition, we design novel Learning-to-Weight Modules (LWM) to combine features from the two streams, thereby enhancing the final forgery localization performance. To improve the training process, we propose a novel Pixel-Inconsistency Data Augmentation (PIDA) strategy, driving the model to focus on capturing inherent pixel-level artifacts instead of mining semantic forgery traces. This work establishes a comprehensive benchmark integrating 15 representative detection models across 12 datasets. Extensive experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints and achieve state-of-the-art generalization and robustness performances in image manipulation localization.

Pixel-Inconsistency Modeling for Image Manipulation Localization

TL;DR

Abstract

Paper Structure (26 sections, 12 equations, 15 figures, 16 tables)

This paper contains 26 sections, 12 equations, 15 figures, 16 tables.

Introduction
Related Work
Manipulation detection and localization methods using low-level traces
Learning-based Manipulation detection and localization methods
Pixel Dependency Modeling
Proposed Method
Overall Framework
Global Pixel Dependency Modeling
Local Pixel Dependency Modeling
Learning-to-Weight Module
Pixel-Inconsistency Data Augmentation
Objective Function
Experiments and Results
Datasets
Evaluation metrics
...and 11 more sections

Figures (15)

Figure 1: Illustration of manipulation types: splicing, copy-move, and inpainting. The top, middle, and bottom rows show the real, forgery, and ground-truth images.
Figure 2: Typical forgery image construction pipeline.
Figure 3: Typical Color Filter Array (CFA) types. (a). Bayer CFA; (b). RGBE; (c). CMY; (d). CMYG.
Figure 4: Proposed image manipulation localization framework. The input image is split into several patches, which are simultaneously fed forward to the Local Pixel Dependency Encoder and Global Pixel Dependency Encoder. The upper stream comprises four Difference Convolution (DC) blocks to capture local pixel inconsistencies in forged images. Meanwhile, the Global Pixel Dependency Encoder, which incorporates four masked self-attention (Masked SA) blocks, focuses on modeling long-range statistics within the input images. Four Learning-to-Weight Modules (LWM) have been devised to combine global and local features extracted by the two encoders. The Forgery Decoder and Boundary Decoder take the aggregated features as inputs and predict the final forgery and boundary maps.
Figure 5: (a). Illustration of the proposed masked attention mechanism. $\otimes$ indicates the matrix multiplication. Q, K, and V stand for Query, Key and Value. We designed the Raster-scan mask to model the pixel dependency. (b). The mask and corresponding pixel scan ordering. The green squares indicate the value '1' while the red squares indicate the value '0'.
...and 10 more figures

Pixel-Inconsistency Modeling for Image Manipulation Localization

TL;DR

Abstract

Pixel-Inconsistency Modeling for Image Manipulation Localization

Authors

TL;DR

Abstract

Table of Contents

Figures (15)