Table of Contents
Fetching ...

Generalized Face Liveness Detection via De-fake Face Generator

Xingming Long, Jie Zhang, Shiguang Shan

TL;DR

This paper tackles the domain generalization challenge in face anti-spoofing by leveraging abundant real-face data. It introduces a De-fake Face Generator trained on real faces to synthesize a real-looking version of any input; the residual between input and this generated real face provides an anomalous cue, which is exploited by an Off-real Attention Network to focus on spoof regions. The authors provide theoretical guarantees for the distinguishability of real and fake cues and demonstrate state-of-the-art cross-domain performance across nine public datasets, with extensive ablations validating the design choices. The proposed plug-and-play OA-Net can enhance existing DG-based FAS methods, offering a practical path toward robust liveness detection in diverse real-world settings.

Abstract

Previous Face Anti-spoofing (FAS) methods face the challenge of generalizing to unseen domains, mainly because most existing FAS datasets are relatively small and lack data diversity. Thanks to the development of face recognition in the past decade, numerous real face images are available publicly, which are however neglected previously by the existing literature. In this paper, we propose an Anomalous cue Guided FAS (AG-FAS) method, which can effectively leverage large-scale additional real faces for improving model generalization via a De-fake Face Generator (DFG). Specifically, by training on a large-scale real face only dataset, the generator obtains the knowledge of what a real face should be like, and thus has the capability of generating a "real" version of any input face image. Consequently, the difference between the input face and the generated "real" face can be treated as cues of attention for the fake feature learning. With the above ideas, an Off-real Attention Network (OA-Net) is proposed which allocates its attention to the spoof region of the input according to the anomalous cue. Extensive experiments on a total of nine public datasets show our method achieves state-of-the-art results under cross-domain evaluations with unseen scenarios and unknown presentation attacks. Besides, we provide theoretical analysis demonstrating the effectiveness of the proposed anomalous cues.

Generalized Face Liveness Detection via De-fake Face Generator

TL;DR

This paper tackles the domain generalization challenge in face anti-spoofing by leveraging abundant real-face data. It introduces a De-fake Face Generator trained on real faces to synthesize a real-looking version of any input; the residual between input and this generated real face provides an anomalous cue, which is exploited by an Off-real Attention Network to focus on spoof regions. The authors provide theoretical guarantees for the distinguishability of real and fake cues and demonstrate state-of-the-art cross-domain performance across nine public datasets, with extensive ablations validating the design choices. The proposed plug-and-play OA-Net can enhance existing DG-based FAS methods, offering a practical path toward robust liveness detection in diverse real-world settings.

Abstract

Previous Face Anti-spoofing (FAS) methods face the challenge of generalizing to unseen domains, mainly because most existing FAS datasets are relatively small and lack data diversity. Thanks to the development of face recognition in the past decade, numerous real face images are available publicly, which are however neglected previously by the existing literature. In this paper, we propose an Anomalous cue Guided FAS (AG-FAS) method, which can effectively leverage large-scale additional real faces for improving model generalization via a De-fake Face Generator (DFG). Specifically, by training on a large-scale real face only dataset, the generator obtains the knowledge of what a real face should be like, and thus has the capability of generating a "real" version of any input face image. Consequently, the difference between the input face and the generated "real" face can be treated as cues of attention for the fake feature learning. With the above ideas, an Off-real Attention Network (OA-Net) is proposed which allocates its attention to the spoof region of the input according to the anomalous cue. Extensive experiments on a total of nine public datasets show our method achieves state-of-the-art results under cross-domain evaluations with unseen scenarios and unknown presentation attacks. Besides, we provide theoretical analysis demonstrating the effectiveness of the proposed anomalous cues.
Paper Structure (29 sections, 22 equations, 5 figures, 8 tables)

This paper contains 29 sections, 22 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Structure of the proposed De-fake Face Generator (DFG). The DFG trained on a large-scale real face dataset can generate a corresponding “real” face of any given input face when taking the input's identity feature $E_{id}(x)$ as guidance.
  • Figure 2: Structure of the Off-real Attention Network (OA-Net). We take the residual of the input and the corresponding image generated by DFG as the anomalous cue, which guides the OA-Net to obtain a more robust FAS feature via the cross-attention module.
  • Figure 3: The overall application diagram of AG-FAS, where OA-Net is integrated with existing DG-based FAS methods as a Plug-and-Play feature extraction module. $L_{DG}$ represents the feature constraint losses used in DG-based FAS methods, such as the adversarial loss and the asymmetric triplet loss in SSDG SSDG_CVPR2020.
  • Figure 4: Visualization of the generation capabilities for different methods. The top row represents the inputs for each model. Subsequently, the results of each method are presented in two rows: the first row displays the reconstructed images, while the second row illustrates the corresponding anomalous cues.
  • Figure 5: Comparison of the DFG trained without/with the identity feature as the conditional guidance, which displays the images reconstructed by the DFG using different diffusion steps $\hat{t}$. The rightmost column represents the corresponding input.