Table of Contents
Fetching ...

FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection

Yangxiang Zhang, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong

TL;DR

FastForensics tackles the need for real-time image manipulation detection on portable devices by introducing a lightweight two-stream architecture with a cognitive branch (Efficient Wavelet-guided Transformer Blocks) and an inspective branch (convolutions) that interact bidirectionally. The method achieves high efficiency with approximately $ ext{sim}$ $8 ext{M}$ parameters and uses a novel Interactive Wavelet-guided Self-Attention mechanism to incorporate wavelet subband information for global traces while preserving local detail. Ablation studies show the effectiveness of wavelet guidance, shared global queries, and cross-branch knowledge exchange, achieving competitive F1 and AUC with substantially higher FPS compared to heavier baselines. This work enables practical, on-device manipulation detection, offering timely authenticity signals for social platforms and other real-world applications.

Abstract

With the rise in popularity of portable devices, the spread of falsified media on social platforms has become rampant. This necessitates the timely identification of authentic content. However, most advanced detection methods are computationally heavy, hindering their real-time application. In this paper, we describe an efficient two-stream architecture for real-time image manipulation detection. Our method consists of two-stream branches targeting the cognitive and inspective perspectives. In the cognitive branch, we propose efficient wavelet-guided Transformer blocks to capture the global manipulation traces related to frequency. This block contains an interactive wavelet-guided self-attention module that integrates wavelet transformation with efficient attention design, interacting with the knowledge from the inspective branch. The inspective branch consists of simple convolutions that capture fine-grained traces and interact bidirectionally with Transformer blocks to provide mutual support. Our method is lightweight ($\sim$ 8M) but achieves competitive performance compared to many other counterparts, demonstrating its efficacy in image manipulation detection and its potential for portable integration.

FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection

TL;DR

FastForensics tackles the need for real-time image manipulation detection on portable devices by introducing a lightweight two-stream architecture with a cognitive branch (Efficient Wavelet-guided Transformer Blocks) and an inspective branch (convolutions) that interact bidirectionally. The method achieves high efficiency with approximately parameters and uses a novel Interactive Wavelet-guided Self-Attention mechanism to incorporate wavelet subband information for global traces while preserving local detail. Ablation studies show the effectiveness of wavelet guidance, shared global queries, and cross-branch knowledge exchange, achieving competitive F1 and AUC with substantially higher FPS compared to heavier baselines. This work enables practical, on-device manipulation detection, offering timely authenticity signals for social platforms and other real-world applications.

Abstract

With the rise in popularity of portable devices, the spread of falsified media on social platforms has become rampant. This necessitates the timely identification of authentic content. However, most advanced detection methods are computationally heavy, hindering their real-time application. In this paper, we describe an efficient two-stream architecture for real-time image manipulation detection. Our method consists of two-stream branches targeting the cognitive and inspective perspectives. In the cognitive branch, we propose efficient wavelet-guided Transformer blocks to capture the global manipulation traces related to frequency. This block contains an interactive wavelet-guided self-attention module that integrates wavelet transformation with efficient attention design, interacting with the knowledge from the inspective branch. The inspective branch consists of simple convolutions that capture fine-grained traces and interact bidirectionally with Transformer blocks to provide mutual support. Our method is lightweight ( 8M) but achieves competitive performance compared to many other counterparts, demonstrating its efficacy in image manipulation detection and its potential for portable integration.
Paper Structure (9 sections, 5 equations, 8 figures, 5 tables)

This paper contains 9 sections, 5 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: AUC (%), FLOPs (circle area), FPS of different methods. Our method (red circle) strikes a good balance between performance and efficiency.
  • Figure 2: Overview of the architecture of our method. The blue and orange cuboids represent the features from the inspective branch and the cognitive branch. EWTB is denoted as an Efficient Wavelet-guided Transformer Block.
  • Figure 3: Overview of efficient wavelet-guided Transformer block (EWTB).
  • Figure 4: Overview of interactive wavelet-guided self-attention (IWSA).
  • Figure 5: Frequency statistics for real and fake images using splicing and object removal.
  • ...and 3 more figures