Table of Contents
Fetching ...

Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from Vision Transformer

Haoyu Zhang, Raghavendra Ramachandra, Kiran Raja, Christoph Busch

TL;DR

This work tackles the problem of detecting face morphing attacks from a single image in open-set scenarios, which is critical for border control and passport verification. It introduces a generalized S-MAD method that leverages a pretrained Vision Transformer to extract deep representations from cropped face regions, followed by a linear SVM for binary classification. The method is evaluated on a FRGC-V2–based morph dataset across five morphing algorithms and three image-processing types (digital, print-scan, print-scan with compression), and is benchmarked against several SOTA MAD baselines. Results indicate improved generalizability for digital images in inter-dataset tests, with competitive performance in print-scan scenarios; limitations include reduced accuracy for degraded image types and a need for broader pretraining and fusion strategies. The work offers a scalable, generalizable MAD approach suitable for integration into real-world FRS validation pipelines and sets the stage for further improvements and external validation.

Abstract

Face morphing attacks have posed severe threats to Face Recognition Systems (FRS), which are operated in border control and passport issuance use cases. Correspondingly, morphing attack detection algorithms (MAD) are needed to defend against such attacks. MAD approaches must be robust enough to handle unknown attacks in an open-set scenario where attacks can originate from various morphing generation algorithms, post-processing and the diversity of printers/scanners. The problem of generalization is further pronounced when the detection has to be made on a single suspected image. In this paper, we propose a generalized single-image-based MAD (S-MAD) algorithm by learning the encoding from Vision Transformer (ViT) architecture. Compared to CNN-based architectures, ViT model has the advantage on integrating local and global information and hence can be suitable to detect the morphing traces widely distributed among the face region. Extensive experiments are carried out on face morphing datasets generated using publicly available FRGC face datasets. Several state-of-the-art (SOTA) MAD algorithms, including representative ones that have been publicly evaluated, have been selected and benchmarked with our ViT-based approach. Obtained results demonstrate the improved detection performance of the proposed S-MAD method on inter-dataset testing (when different data is used for training and testing) and comparable performance on intra-dataset testing (when the same data is used for training and testing) experimental protocol.

Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from Vision Transformer

TL;DR

This work tackles the problem of detecting face morphing attacks from a single image in open-set scenarios, which is critical for border control and passport verification. It introduces a generalized S-MAD method that leverages a pretrained Vision Transformer to extract deep representations from cropped face regions, followed by a linear SVM for binary classification. The method is evaluated on a FRGC-V2–based morph dataset across five morphing algorithms and three image-processing types (digital, print-scan, print-scan with compression), and is benchmarked against several SOTA MAD baselines. Results indicate improved generalizability for digital images in inter-dataset tests, with competitive performance in print-scan scenarios; limitations include reduced accuracy for degraded image types and a need for broader pretraining and fusion strategies. The work offers a scalable, generalizable MAD approach suitable for integration into real-world FRS validation pipelines and sets the stage for further improvements and external validation.

Abstract

Face morphing attacks have posed severe threats to Face Recognition Systems (FRS), which are operated in border control and passport issuance use cases. Correspondingly, morphing attack detection algorithms (MAD) are needed to defend against such attacks. MAD approaches must be robust enough to handle unknown attacks in an open-set scenario where attacks can originate from various morphing generation algorithms, post-processing and the diversity of printers/scanners. The problem of generalization is further pronounced when the detection has to be made on a single suspected image. In this paper, we propose a generalized single-image-based MAD (S-MAD) algorithm by learning the encoding from Vision Transformer (ViT) architecture. Compared to CNN-based architectures, ViT model has the advantage on integrating local and global information and hence can be suitable to detect the morphing traces widely distributed among the face region. Extensive experiments are carried out on face morphing datasets generated using publicly available FRGC face datasets. Several state-of-the-art (SOTA) MAD algorithms, including representative ones that have been publicly evaluated, have been selected and benchmarked with our ViT-based approach. Obtained results demonstrate the improved detection performance of the proposed S-MAD method on inter-dataset testing (when different data is used for training and testing) and comparable performance on intra-dataset testing (when the same data is used for training and testing) experimental protocol.
Paper Structure (9 sections, 3 equations, 6 figures, 8 tables)

This paper contains 9 sections, 3 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Hypothesised illustration of S-MAD as open-set problem: A model trained on known morphing attacks may fail at unknown morphing attacks.
  • Figure 2: Overview of our proposed method using pretrained Vision Transformer model.
  • Figure 3: Boxplot of the statistical analysis on D-EER computed for all cross-dataset testing results on FRGC morph database. (a) Ensemble Features (b) Hybrid Features (c) Deep Features (d) Steerable Features(e) Multi-Modality (f) Residual AutoEncoder (g) Proposed Method. (1): Digital (2): Print-scan (3): Print-scan Compression
  • Figure 4: T-SNE plot of the feature space used in proposed method with digital images
  • Figure 5: T-SNE plot of the feature space used in proposed method with print-scanned images
  • ...and 1 more figures