FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection
Yangxiang Zhang, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong
TL;DR
FastForensics tackles the need for real-time image manipulation detection on portable devices by introducing a lightweight two-stream architecture with a cognitive branch (Efficient Wavelet-guided Transformer Blocks) and an inspective branch (convolutions) that interact bidirectionally. The method achieves high efficiency with approximately $ ext{sim}$ $8 ext{M}$ parameters and uses a novel Interactive Wavelet-guided Self-Attention mechanism to incorporate wavelet subband information for global traces while preserving local detail. Ablation studies show the effectiveness of wavelet guidance, shared global queries, and cross-branch knowledge exchange, achieving competitive F1 and AUC with substantially higher FPS compared to heavier baselines. This work enables practical, on-device manipulation detection, offering timely authenticity signals for social platforms and other real-world applications.
Abstract
With the rise in popularity of portable devices, the spread of falsified media on social platforms has become rampant. This necessitates the timely identification of authentic content. However, most advanced detection methods are computationally heavy, hindering their real-time application. In this paper, we describe an efficient two-stream architecture for real-time image manipulation detection. Our method consists of two-stream branches targeting the cognitive and inspective perspectives. In the cognitive branch, we propose efficient wavelet-guided Transformer blocks to capture the global manipulation traces related to frequency. This block contains an interactive wavelet-guided self-attention module that integrates wavelet transformation with efficient attention design, interacting with the knowledge from the inspective branch. The inspective branch consists of simple convolutions that capture fine-grained traces and interact bidirectionally with Transformer blocks to provide mutual support. Our method is lightweight ($\sim$ 8M) but achieves competitive performance compared to many other counterparts, demonstrating its efficacy in image manipulation detection and its potential for portable integration.
