Table of Contents
Fetching ...

Dual Teacher Knowledge Distillation with Domain Alignment for Face Anti-spoofing

Zhe Kong, Wentian Zhang, Tao Wang, Kaihao Zhang, Yuexiang Li, Xiaoying Tang, Wenhan Luo

TL;DR

This work tackles the limited cross-domain generalization of face anti-spoofing by introducing DTDA, a framework that combines a domain adversarial attack to align domains with unstable DAT training and a dual-teacher knowledge distillation scheme that injects perceptual and generative face priors. The model trains in a multi-task setting, leveraging teachers pretrained on large-scale face tasks and using adversarial inputs for both student and teachers to maximize shared priors. Across cross-dataset and intra-dataset evaluations on public benchmarks, DTDA consistently outperforms state-of-the-art methods while maintaining efficiency (e.g., 95 FPS with ResNet-18 backbone). Ablation and visualization studies corroborate the benefits of combining DAA with dual teachers, demonstrating improved domain-invariant feature learning and more discriminative guidance from priors for live/spoof classification.

Abstract

Face recognition systems have raised concerns due to their vulnerability to different presentation attacks, and system security has become an increasingly critical concern. Although many face anti-spoofing (FAS) methods perform well in intra-dataset scenarios, their generalization remains a challenge. To address this issue, some methods adopt domain adversarial training (DAT) to extract domain-invariant features. However, the competition between the encoder and the domain discriminator can cause the network to be difficult to train and converge. In this paper, we propose a domain adversarial attack (DAA) method to mitigate the training instability problem by adding perturbations to the input images, which makes them indistinguishable across domains and enables domain alignment. Moreover, since models trained on limited data and types of attacks cannot generalize well to unknown attacks, we propose a dual perceptual and generative knowledge distillation framework for face anti-spoofing that utilizes pre-trained face-related models containing rich face priors. Specifically, we adopt two different face-related models as teachers to transfer knowledge to the target student model. The pre-trained teacher models are not from the task of face anti-spoofing but from perceptual and generative tasks, respectively, which implicitly augment the data. By combining both DAA and dual-teacher knowledge distillation, we develop a dual teacher knowledge distillation with domain alignment framework (DTDA) for face anti-spoofing. The advantage of our proposed method has been verified through extensive ablation studies and comparison with state-of-the-art methods on public datasets across multiple protocols.

Dual Teacher Knowledge Distillation with Domain Alignment for Face Anti-spoofing

TL;DR

This work tackles the limited cross-domain generalization of face anti-spoofing by introducing DTDA, a framework that combines a domain adversarial attack to align domains with unstable DAT training and a dual-teacher knowledge distillation scheme that injects perceptual and generative face priors. The model trains in a multi-task setting, leveraging teachers pretrained on large-scale face tasks and using adversarial inputs for both student and teachers to maximize shared priors. Across cross-dataset and intra-dataset evaluations on public benchmarks, DTDA consistently outperforms state-of-the-art methods while maintaining efficiency (e.g., 95 FPS with ResNet-18 backbone). Ablation and visualization studies corroborate the benefits of combining DAA with dual teachers, demonstrating improved domain-invariant feature learning and more discriminative guidance from priors for live/spoof classification.

Abstract

Face recognition systems have raised concerns due to their vulnerability to different presentation attacks, and system security has become an increasingly critical concern. Although many face anti-spoofing (FAS) methods perform well in intra-dataset scenarios, their generalization remains a challenge. To address this issue, some methods adopt domain adversarial training (DAT) to extract domain-invariant features. However, the competition between the encoder and the domain discriminator can cause the network to be difficult to train and converge. In this paper, we propose a domain adversarial attack (DAA) method to mitigate the training instability problem by adding perturbations to the input images, which makes them indistinguishable across domains and enables domain alignment. Moreover, since models trained on limited data and types of attacks cannot generalize well to unknown attacks, we propose a dual perceptual and generative knowledge distillation framework for face anti-spoofing that utilizes pre-trained face-related models containing rich face priors. Specifically, we adopt two different face-related models as teachers to transfer knowledge to the target student model. The pre-trained teacher models are not from the task of face anti-spoofing but from perceptual and generative tasks, respectively, which implicitly augment the data. By combining both DAA and dual-teacher knowledge distillation, we develop a dual teacher knowledge distillation with domain alignment framework (DTDA) for face anti-spoofing. The advantage of our proposed method has been verified through extensive ablation studies and comparison with state-of-the-art methods on public datasets across multiple protocols.
Paper Structure (17 sections, 10 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 17 sections, 10 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of domain adversarial training (DAT) and the proposed domain adversarial attack (DAA) method. a) is the DAT method. After the feature encoder, a gradient reversal layer (GRL) is inserted. A feature encoder is trained for producing the domain-invariant features, competing with domain discriminators simultaneously during the learning process, which gradually guides the learned features to be domain indistinguishable for face anti-spoofing. b) is our proposed DAA method. A domain classifier is adopted to generate perturbations aiming at making the input image domains indistinguishable, which can encourage the model to learn domain-invariant features for face anti-spoofing and improve its generalization to the target domain.
  • Figure 2: The overall architecture of our face anti-spoofing via dual teachers with domain alignment (DTDA) method. The input faces with different color backgrounds represent examples of different domains. During the adversarial attack stage, a domain classifier is utilized to generate perturbations that make the input image domains indistinguishable, thereby forcing the model to learn domain-invariant features. In the dual teacher knowledge distillation stage, two teacher models transfer face priors to the student model. The dual teacher models are sourced from perceptual and generative tasks trained on millions of face data, further improving the generalization of the model.
  • Figure 3: ROC curves for the ablation study under the cross-dataset testing. FR denotes the face recognition network, FE denotes the face attribute editing network and DAA denotes the domain adversarial attack method.
  • Figure 4: The t-SNE visualizations of the extracted features under O&M&I to C. a) Visualization results before using the DTDA method. b) Visualization results after using the DTDA method. Distinct colors are used to signify features originating from live images, spoof images, or features derived from different domains.
  • Figure 5: Grad-CAM visualizations of the proposed method under O&M&I to C. The images in the red box show the visualization of live faces, and the green box displays the visualization results of spoof faces. a) Original image. b) Visualizations after using the ImageNet pre-train model. c) Visualization results after using the face attribute editing model. d) Visualization results after using the face recognition model. e) Visualization results after combining face attribute editing and recognition models.
  • ...and 1 more figures