Table of Contents
Fetching ...

LoRA-Enhanced Vision Transformer for Single Image based Morphing Attack Detection via Knowledge Distillation from EfficientNet

Ria Shekhawat, Sushrut Patwardhan, Raghavendra Ramachandra, Praveen Kumar Chandaliya, Kishor P. Upla

TL;DR

This work tackles morphing attacks on Face Recognition Systems by introducing a knowledge-distillation framework where a CNN-based EfficientNetV2 teacher guides a Vision Transformer student fine-tuned with Low-Rank Adaptation (LoRA). An adapter maps teacher embeddings to the student space, and the learning objective combines $L_{KL}$ with $L_{CE}$ to improve detection while maintaining computational efficiency. Evaluations on three public morphing datasets with ten morphing techniques show that the proposed KD+LoRA approach achieves state-of-the-art performance (e.g., D-EER $=3.25\%$, BPCER $=1.86\%$ at MACER) and strong generalization across diverse attacks. The method demonstrates practical potential for robust and scalable S-MAD in real-world security systems, with future work focusing on curriculum-style distillation and semi-supervised extensions.

Abstract

Face Recognition Systems (FRS) are critical for security but remain vulnerable to morphing attacks, where synthetic images blend biometric features from multiple individuals. We propose a novel Single-Image Morphing Attack Detection (S-MAD) approach using a teacher-student framework, where a CNN-based teacher model refines a ViT-based student model. To improve efficiency, we integrate Low-Rank Adaptation (LoRA) for fine-tuning, reducing computational costs while maintaining high detection accuracy. Extensive experiments are conducted on a morphing dataset built from three publicly available face datasets, incorporating ten different morphing generation algorithms to assess robustness. The proposed method is benchmarked against six state-of-the-art S-MAD techniques, demonstrating superior detection performance and computational efficiency.

LoRA-Enhanced Vision Transformer for Single Image based Morphing Attack Detection via Knowledge Distillation from EfficientNet

TL;DR

This work tackles morphing attacks on Face Recognition Systems by introducing a knowledge-distillation framework where a CNN-based EfficientNetV2 teacher guides a Vision Transformer student fine-tuned with Low-Rank Adaptation (LoRA). An adapter maps teacher embeddings to the student space, and the learning objective combines with to improve detection while maintaining computational efficiency. Evaluations on three public morphing datasets with ten morphing techniques show that the proposed KD+LoRA approach achieves state-of-the-art performance (e.g., D-EER , BPCER at MACER) and strong generalization across diverse attacks. The method demonstrates practical potential for robust and scalable S-MAD in real-world security systems, with future work focusing on curriculum-style distillation and semi-supervised extensions.

Abstract

Face Recognition Systems (FRS) are critical for security but remain vulnerable to morphing attacks, where synthetic images blend biometric features from multiple individuals. We propose a novel Single-Image Morphing Attack Detection (S-MAD) approach using a teacher-student framework, where a CNN-based teacher model refines a ViT-based student model. To improve efficiency, we integrate Low-Rank Adaptation (LoRA) for fine-tuning, reducing computational costs while maintaining high detection accuracy. Extensive experiments are conducted on a morphing dataset built from three publicly available face datasets, incorporating ten different morphing generation algorithms to assess robustness. The proposed method is benchmarked against six state-of-the-art S-MAD techniques, demonstrating superior detection performance and computational efficiency.

Paper Structure

This paper contains 10 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of proposed method: Bonafide and Morph images are passed through the frozen pre-trained teacher and active trainable student models. The feature embeddings from the teacher are passed to the adapter, which transforms and passes them for knowledge distillation to the KL loss function that also receives the student's features. Logits from the student and true labels are used to compute the CE loss. These two losses make the combined loss function for the student.
  • Figure 2: Illustration of bona fide and morphing images from DB1, DB2 and DB3 with different types of Morphing generation Techniques (MT).
  • Figure 3: Detection Error Trade-off curve for the Proposed Method against SOTA S-MAD algorithms.
  • Figure 4: LIME based activation maps: (a) and (b) represent correctly classified bona fide images; (c) and (d) represent misclassified bona fide images; (e) and (f) represent correctly classified morph images; (g) and (h) represent misclassified morph images.