DiffusionFF: A Diffusion-based Framework for Joint Face Forgery Detection and Fine-Grained Artifact Localization

Siran Peng; Haoyuan Zhang; Li Gao; Tianshuo Zhang; Xiangyu Zhu; Bao Li; Weisong Zhao; Zhen Lei

DiffusionFF: A Diffusion-based Framework for Joint Face Forgery Detection and Fine-Grained Artifact Localization

Siran Peng, Haoyuan Zhang, Li Gao, Tianshuo Zhang, Xiangyu Zhu, Bao Li, Weisong Zhao, Zhen Lei

TL;DR

DiffusionFF addresses the dual need for accurate face forgery detection and fine-grained artifact localization by introducing a diffusion-based decoder conditioned on multi-scale features from a pretrained forgery detector. The framework uses an encoder–decoder architecture where the detector acts as an artifact encoder and a denoising diffusion model serves as the artifact decoder to generate precise DSSIM maps, which are fused with high-level detector features to produce the final decision. Through a two-stage training strategy and extensive experiments on FF++ and cross-dataset benchmarks, DiffusionFF achieves state-of-the-art detection performance and superior artifact localization, while also offering improved explainability. Its plug-and-play nature as an auxiliary module further enhances existing detectors, though diffusion-based inference remains computationally intensive.

Abstract

The rapid evolution of deepfake technologies demands robust and reliable face forgery detection algorithms. While determining whether an image has been manipulated remains essential, the ability to precisely localize forgery clues is also important for enhancing model explainability and building user trust. To address this dual challenge, we introduce DiffusionFF, a diffusion-based framework that simultaneously performs face forgery detection and fine-grained artifact localization. Our key idea is to establish a novel encoder-decoder architecture: a pretrained forgery detector serves as a powerful "artifact encoder", and a denoising diffusion model is repurposed as an "artifact decoder". Conditioned on multi-scale forgery-related features extracted by the encoder, the decoder progressively synthesizes a detailed artifact localization map. We then fuse this fine-grained localization map with high-level semantic features from the forgery detector, leading to substantial improvements in detection capability. Extensive experiments show that DiffusionFF achieves state-of-the-art (SOTA) performance across multiple benchmarks, underscoring its superior effectiveness and explainability.

DiffusionFF: A Diffusion-based Framework for Joint Face Forgery Detection and Fine-Grained Artifact Localization

TL;DR

Abstract

DiffusionFF: A Diffusion-based Framework for Joint Face Forgery Detection and Fine-Grained Artifact Localization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)