Table of Contents
Fetching ...

DiffFAS: Face Anti-Spoofing via Generative Diffusion Models

Xinxu Ge, Xin Liu, Zitong Yu, Jingang Shi, Chun Qi, Jie Li, Heikki Kälviäinen

TL;DR

This paper proposes DiffFAS framework, which quantifies quality as prior information input into the network to counter image quality shift, and performs diffusion-based high-fidelity cross-domain and cross-attack types generation to counter image style shift.

Abstract

Face anti-spoofing (FAS) plays a vital role in preventing face recognition (FR) systems from presentation attacks. Nowadays, FAS systems face the challenge of domain shift, impacting the generalization performance of existing FAS methods. In this paper, we rethink about the inherence of domain shift and deconstruct it into two factors: image style and image quality. Quality influences the purity of the presentation of spoof information, while style affects the manner in which spoof information is presented. Based on our analysis, we propose DiffFAS framework, which quantifies quality as prior information input into the network to counter image quality shift, and performs diffusion-based high-fidelity cross-domain and cross-attack types generation to counter image style shift. DiffFAS transforms easily collectible live faces into high-fidelity attack faces with precise labels while maintaining consistency between live and spoof face identities, which can also alleviate the scarcity of labeled data with novel type attacks faced by nowadays FAS system. We demonstrate the effectiveness of our framework on challenging cross-domain and cross-attack FAS datasets, achieving the state-of-the-art performance. Available at https://github.com/murphytju/DiffFAS.

DiffFAS: Face Anti-Spoofing via Generative Diffusion Models

TL;DR

This paper proposes DiffFAS framework, which quantifies quality as prior information input into the network to counter image quality shift, and performs diffusion-based high-fidelity cross-domain and cross-attack types generation to counter image style shift.

Abstract

Face anti-spoofing (FAS) plays a vital role in preventing face recognition (FR) systems from presentation attacks. Nowadays, FAS systems face the challenge of domain shift, impacting the generalization performance of existing FAS methods. In this paper, we rethink about the inherence of domain shift and deconstruct it into two factors: image style and image quality. Quality influences the purity of the presentation of spoof information, while style affects the manner in which spoof information is presented. Based on our analysis, we propose DiffFAS framework, which quantifies quality as prior information input into the network to counter image quality shift, and performs diffusion-based high-fidelity cross-domain and cross-attack types generation to counter image style shift. DiffFAS transforms easily collectible live faces into high-fidelity attack faces with precise labels while maintaining consistency between live and spoof face identities, which can also alleviate the scarcity of labeled data with novel type attacks faced by nowadays FAS system. We demonstrate the effectiveness of our framework on challenging cross-domain and cross-attack FAS datasets, achieving the state-of-the-art performance. Available at https://github.com/murphytju/DiffFAS.
Paper Structure (19 sections, 18 equations, 13 figures, 10 tables)

This paper contains 19 sections, 18 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: DiffFAS identifies quality and style as separable elements (see the left part) in images and enables cross-domain, cross-attack generation (see the right part) to counteract style discrepancies due to domain shifts.
  • Figure 2: The proposed DiffFAS Generative Framework is a UNet-based network composed of a Spoofing style encoder and a noise prediction module, training with Spoofing Style Pool (the first column) and Live-Spoof Pair Images (the second column). The spoofing style encoder extract texture of the random selected spoof image, and get multi-scale features from different encoder layers. We design an asymmetric Spoofing Style Fusion Module (STFM) to reduce the introduction of identity information of the conditional branch, and achieve information aggregation with the backbone through cross-attention. This allows the network to fully capture the spoof texture and achieve high-fidelity spoof synthesis. During the Inference stage, we employ image editing techniques for the sampling process, enhancing our control over the inference stage, and further improving the consistency between the generative ID and the original ID.
  • Figure 3: (a) Examples for different domain's live sample BRISQUE score, and lower score means higher quality. (b) Visualization of the equivalent additive margin function $F({b}_{rq}, \theta) = \cos (\theta +\psi ) - \omega -\cos (\theta )$, with different scale coefficient 0.2, 0.4. As training progresses, the equivalent additive margin for HQ samples is larger, while LQ images are conversely.
  • Figure 4: Generative samples for cross-attack on PADISI (up) and cross-domain on OCIM (down). The first row is the ID image, and the second row is the randomly selected guide image.
  • Figure 5: Visualization with various baselines. It is evident that DiffFAS achieves the balance between identity and spoof texture.
  • ...and 8 more figures