Table of Contents
Fetching ...

Exploring Depth Information for Detecting Manipulated Face Videos

Haoyue Wang, Sheng Li, Ji He, Zhenxing Qian, Xinpeng Zhang, Shaolin Fan

TL;DR

This paper proposes a Face Depth Map Transformer to estimate the face depth map patch by patch from an RGB face image, which is able to capture the local depth anomaly created due to manipulation, and proposes an RGB-Depth Inconsistency Attention (RDIA) module to effectively capture the inter-frame inconsistency for multi-frame input.

Abstract

Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images/videos. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as face recognition or face detection, is unfortunately paid little attention to in literature for face manipulation detection. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information for robust face manipulation detection. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from an RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. We also propose an RGB-Depth Inconsistency Attention (RDIA) module to effectively capture the inter-frame inconsistency for multi-frame input. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.

Exploring Depth Information for Detecting Manipulated Face Videos

TL;DR

This paper proposes a Face Depth Map Transformer to estimate the face depth map patch by patch from an RGB face image, which is able to capture the local depth anomaly created due to manipulation, and proposes an RGB-Depth Inconsistency Attention (RDIA) module to effectively capture the inter-frame inconsistency for multi-frame input.

Abstract

Face manipulation detection has been receiving a lot of attention for the reliability and security of the face images/videos. Recent studies focus on using auxiliary information or prior knowledge to capture robust manipulation traces, which are shown to be promising. As one of the important face features, the face depth map, which has shown to be effective in other areas such as face recognition or face detection, is unfortunately paid little attention to in literature for face manipulation detection. In this paper, we explore the possibility of incorporating the face depth map as auxiliary information for robust face manipulation detection. To this end, we first propose a Face Depth Map Transformer (FDMT) to estimate the face depth map patch by patch from an RGB face image, which is able to capture the local depth anomaly created due to manipulation. The estimated face depth map is then considered as auxiliary information to be integrated with the backbone features using a Multi-head Depth Attention (MDA) mechanism that is newly designed. We also propose an RGB-Depth Inconsistency Attention (RDIA) module to effectively capture the inter-frame inconsistency for multi-frame input. Various experiments demonstrate the advantage of our proposed method for face manipulation detection.

Paper Structure

This paper contains 19 sections, 13 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Examples of the face depth maps and residuals. (a) Depth maps of real face images from different sources (FF++rossler2019faceforensics++ and Celeb-DF li2020celebdf); (b) depth maps of two consecutive real face frames and the corresponding residual; (c) depth maps of two consecutive fake face frames and the corresponding residual. Please refer to Section \ref{['fdmt']} for the computation of the ground truth face depth maps.
  • Figure 2: An overview of the proposed method for face manipulation detection.
  • Figure 3: Examples of the estimated face depth map using PRNet feng2018joint. Images in the "Real” row are real face image and depth map. Images in the "Fake” row are manipulated face image and depth map. The face images are selected from FF++ rossler2019faceforensics++.
  • Figure 4: Examples of the ground truth face depth map. Images in the "Real" row are the real face image and the ground truth. Images in the "Fake" row are manipulated face image and the ground truth. The face images are selected from FF++ rossler2019faceforensics++.
  • Figure 5: The network structure of Multi-head Depth Attention.
  • ...and 5 more figures