Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization
Changtao Miao, Qi Chu, Tao Gong, Zhentao Tan, Zhenchao Jin, Wanyi Zhuang, Man Luo, Honggang Hu, Nenghai Yu
TL;DR
This work tackles the challenge of detecting and localizing multi-face forgeries by introducing MoNFAP, a unified framework that jointly predicts image-level authenticity and pixel-level tampered regions. It combines a Forgery-aware Unified Predictor (FUP), which uses token learning and Forgery-aware Transformers to link classification with localization, with a Mixture-of-Noises Module (MNM) that injects diverse noise cues via a four-expert MoNE architecture to strengthen forgery cues in RGB features. The approach leverages a multi-scale strategy to detect small manipulated regions and employs an MoE-inspired gating mechanism with an Importance Loss to balance expert usage. Extensive benchmarks on curated multi-face datasets (OFV2, FFIW-derived variants, and Manual-Fake) demonstrate state-of-the-art localization performance, strong cross-dataset generalization, and robustness to real-world perturbations, highlighting the practical impact for reliable multi-face forgery analysis.
Abstract
With the advancement of face manipulation technology, forgery images in multi-face scenarios are gradually becoming a more complex and realistic challenge. Despite this, detection and localization methods for such multi-face manipulations remain underdeveloped. Traditional manipulation localization methods either indirectly derive detection results from localization masks, resulting in limited detection performance, or employ a naive two-branch structure to simultaneously obtain detection and localization results, which cannot effectively benefit the localization capability due to limited interaction between two tasks. This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization. The MoNFAP primarily introduces two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM). The FUP integrates detection and localization tasks using a token learning strategy and multiple forgery-aware transformers, which facilitates the use of classification information to enhance localization capability. Besides, motivated by the crucial role of noise information in forgery detection, the MNM leverages multiple noise extractors based on the concept of the mixture of experts to enhance the general RGB features, further boosting the performance of our framework. Finally, we establish a comprehensive benchmark for multi-face detection and localization and the proposed \textit{MoNFAP} achieves significant performance. The codes will be made available.
