Table of Contents
Fetching ...

GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning

Yaning Zhang, Zitong Yu, Tianyi Wang, Xiaobin Huang, Linlin Shen, Zan Gao, Jianfeng Ren

TL;DR

A large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, is proposed to facilitate the advancement of deepfake detection, which contains a large number of forgery faces generated by advanced generators such as the diffusion-based model and more detailed labels about the manipulation approaches and adopted generators.

Abstract

The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Thus, benchmarking and advancing techniques detecting digital manipulation become an urgent issue. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology, which does not involve the most recent technologies like diffusion. The diversity and quality of images generated by diffusion models have been significantly improved and thus a much more challenging face forgery dataset shall be used to evaluate SOTA forgery detection literature. In this paper, we propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection, which contains a large number of forgery faces generated by advanced generators such as the diffusion-based model and more detailed labels about the manipulation approaches and adopted generators. In addition to evaluating SOTA approaches on our benchmark, we design an innovative cross appearance-edge learning (CAEL) detector to capture multi-grained appearance and edge global representations, and detect discriminative and general forgery traces. Moreover, we devise an appearance-edge cross-attention (AECA) module to explore the various integrations across two domains. Extensive experiment results and visualizations show that our detection model outperforms the state of the arts on different settings like cross-generator, cross-forgery, and cross-dataset evaluations. Code and datasets will be available at \url{https://github.com/Jenine-321/GenFace

GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning

TL;DR

A large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, is proposed to facilitate the advancement of deepfake detection, which contains a large number of forgery faces generated by advanced generators such as the diffusion-based model and more detailed labels about the manipulation approaches and adopted generators.

Abstract

The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Thus, benchmarking and advancing techniques detecting digital manipulation become an urgent issue. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology, which does not involve the most recent technologies like diffusion. The diversity and quality of images generated by diffusion models have been significantly improved and thus a much more challenging face forgery dataset shall be used to evaluate SOTA forgery detection literature. In this paper, we propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection, which contains a large number of forgery faces generated by advanced generators such as the diffusion-based model and more detailed labels about the manipulation approaches and adopted generators. In addition to evaluating SOTA approaches on our benchmark, we design an innovative cross appearance-edge learning (CAEL) detector to capture multi-grained appearance and edge global representations, and detect discriminative and general forgery traces. Moreover, we devise an appearance-edge cross-attention (AECA) module to explore the various integrations across two domains. Extensive experiment results and visualizations show that our detection model outperforms the state of the arts on different settings like cross-generator, cross-forgery, and cross-dataset evaluations. Code and datasets will be available at \url{https://github.com/Jenine-321/GenFace
Paper Structure (16 sections, 5 equations, 9 figures, 16 tables)

This paper contains 16 sections, 5 equations, 9 figures, 16 tables.

Figures (9)

  • Figure 1: Taxonomy of the GenFace dataset. At level 1, we divide images into real or fake faces. The second level, i.e., forgery level, classifies forged images into three types, i.e., Entire Face Synthesis (EFS), Attribute Manipulation (AM), and Face Swap (FS). Then, we separate images based on whether forgery approaches are diffusion-based or GAN-based. The final level refers to the specific generators. LatDiff is latent diffusion lad. CollDiff is collaborative diffusion coll. Diffae is diffusion autoencoders diffusion. DiffFace is diffusion face diff. MFGAN is MaskFaceGAN pernuvs2023maskfacegan. LatTrans is latent transformerLatent.
  • Figure 2: The visualization of images produced by different operators. The first row represents the Red Green Blue (RGB) images. Every two columns display the real and fake samples of various manipulations.
  • Figure 3: Schematic illustration of the collection and partitioning of GenFace.
  • Figure 4: The architecture of the proposed model. We first encode fine-grained appearance features, coarse-grained appearance representations, and edge embeddings from input RGB images and edge images via a feature alignment module, respectively. We then fed them into the Multi-grained Appearance-Edge Transformer (MAET) module to capture diverse appearance-edge forgery patterns, global edge features, and diverse mixture representations across two domains. Finally, they are sent to respective experts, each of which consists of a fully connected layer, to yield a single output. The outputs of multiple experts are element-wisely summed and then fed into a softmax function to generate the final prediction.
  • Figure 5: The workflow of appearance-edge cross-attention.
  • ...and 4 more figures