Table of Contents
Fetching ...

Generalized Face Anti-spoofing via Finer Domain Partition and Disentangling Liveness-irrelevant Factors

Jingyi Yang, Zitong Yu, Xiuming Ni, Jia He, Hui Li

TL;DR

This work tackles the challenge of cross-domain generalization in face anti-spoofing by introducing a fine-grained identity-based domain partition and explicit disentanglement of liveness from identity. The proposed DLIF framework uses two encoders to learn orthogonal latent spaces for liveness ($\mathcal{U}$) and identity ($\mathcal{V}$), reinforced by Style Cross, Channel-wise Style Attention, and the Asymmetric Augmented Instance Contrast loss. Empirical results on four public datasets demonstrate state-of-the-art generalization for cross-dataset and limited-source scenarios, with strong scalability when leveraging well-trained face recognition models and larger identity diversity. The approach provides a practical pathway to robust FAS systems that generalize to unseen domains while remaining efficient and compatible with existing FR models.

Abstract

Face anti-spoofing techniques based on domain generalization have recently been studied widely. Adversarial learning and meta-learning techniques have been adopted to learn domain-invariant representations. However, prior approaches often consider the dataset gap as the primary factor behind domain shifts. This perspective is not fine-grained enough to reflect the intrinsic gap among the data accurately. In our work, we redefine domains based on identities rather than datasets, aiming to disentangle liveness and identity attributes. We emphasize ignoring the adverse effect of identity shift, focusing on learning identity-invariant liveness representations through orthogonalizing liveness and identity features. To cope with style shifts, we propose Style Cross module to expand the stylistic diversity and Channel-wise Style Attention module to weaken the sensitivity to style shifts, aiming to learn robust liveness representations. Furthermore, acknowledging the asymmetry between live and spoof samples, we introduce a novel contrastive loss, Asymmetric Augmented Instance Contrast. Extensive experiments on four public datasets demonstrate that our method achieves state-of-the-art performance under cross-dataset and limited source dataset scenarios. Additionally, our method has good scalability when expanding diversity of identities. The codes will be released soon.

Generalized Face Anti-spoofing via Finer Domain Partition and Disentangling Liveness-irrelevant Factors

TL;DR

This work tackles the challenge of cross-domain generalization in face anti-spoofing by introducing a fine-grained identity-based domain partition and explicit disentanglement of liveness from identity. The proposed DLIF framework uses two encoders to learn orthogonal latent spaces for liveness () and identity (), reinforced by Style Cross, Channel-wise Style Attention, and the Asymmetric Augmented Instance Contrast loss. Empirical results on four public datasets demonstrate state-of-the-art generalization for cross-dataset and limited-source scenarios, with strong scalability when leveraging well-trained face recognition models and larger identity diversity. The approach provides a practical pathway to robust FAS systems that generalize to unseen domains while remaining efficient and compatible with existing FR models.

Abstract

Face anti-spoofing techniques based on domain generalization have recently been studied widely. Adversarial learning and meta-learning techniques have been adopted to learn domain-invariant representations. However, prior approaches often consider the dataset gap as the primary factor behind domain shifts. This perspective is not fine-grained enough to reflect the intrinsic gap among the data accurately. In our work, we redefine domains based on identities rather than datasets, aiming to disentangle liveness and identity attributes. We emphasize ignoring the adverse effect of identity shift, focusing on learning identity-invariant liveness representations through orthogonalizing liveness and identity features. To cope with style shifts, we propose Style Cross module to expand the stylistic diversity and Channel-wise Style Attention module to weaken the sensitivity to style shifts, aiming to learn robust liveness representations. Furthermore, acknowledging the asymmetry between live and spoof samples, we introduce a novel contrastive loss, Asymmetric Augmented Instance Contrast. Extensive experiments on four public datasets demonstrate that our method achieves state-of-the-art performance under cross-dataset and limited source dataset scenarios. Additionally, our method has good scalability when expanding diversity of identities. The codes will be released soon.
Paper Structure (28 sections, 14 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 28 sections, 14 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: (Left) Orthogonalization of liveness and identity attributes. The earth's axis represents the subspace $\mathcal{U}$ associated with the liveness component, where the green and red arrows indicate "live" and "spoof". The equatorial plane represents the subspace $\mathcal{V}$ belonging to the identity component, and colored arrows represent different identities. (Right) In $\mathcal{U}$ space, the liveness of the content template and the style template should be consistent. While in $\mathcal{V}$ space, the identity invariance is guaranteed.
  • Figure 2: (Left) The architecture mainly consists of two encoders: encoder $U$ and $V$. $U$ extracts the liveness feature, and $V$ extracts the identity feature. The SC implements two types of mode in $U$ and $V$ which are liveness-invariant and identity-invariant, the dashed line indicates detachable, and we use colors from light to dark to represent the low, middle, and high levels of the encoder. The CWSA is utilized to weaken the sensitivity of the model for style variation. In addition, (Right) shows the style augmented flow of ($\times$) and ($+$) structures.
  • Figure 3: AAIC results in a compact cluster of live samples, scatter pattern of spoofs.
  • Figure 4: Feature distribution of different contrast strategies via t-SNE visualization.
  • Figure 5: (a.-) (b.-), (c.-), (d.-), (e.-), and (f.-) correspond to the feature distribution of L, M, H, M+H, M $\times$H, SSA these six style augmentation methods respectively. The (-.1) and (-.2) indicate whether $U$ is equipped with the CWSA.
  • ...and 2 more figures