LoRA-Enhanced Vision Transformer for Single Image based Morphing Attack Detection via Knowledge Distillation from EfficientNet
Ria Shekhawat, Sushrut Patwardhan, Raghavendra Ramachandra, Praveen Kumar Chandaliya, Kishor P. Upla
TL;DR
This work tackles morphing attacks on Face Recognition Systems by introducing a knowledge-distillation framework where a CNN-based EfficientNetV2 teacher guides a Vision Transformer student fine-tuned with Low-Rank Adaptation (LoRA). An adapter maps teacher embeddings to the student space, and the learning objective combines $L_{KL}$ with $L_{CE}$ to improve detection while maintaining computational efficiency. Evaluations on three public morphing datasets with ten morphing techniques show that the proposed KD+LoRA approach achieves state-of-the-art performance (e.g., D-EER $=3.25\%$, BPCER $=1.86\%$ at MACER) and strong generalization across diverse attacks. The method demonstrates practical potential for robust and scalable S-MAD in real-world security systems, with future work focusing on curriculum-style distillation and semi-supervised extensions.
Abstract
Face Recognition Systems (FRS) are critical for security but remain vulnerable to morphing attacks, where synthetic images blend biometric features from multiple individuals. We propose a novel Single-Image Morphing Attack Detection (S-MAD) approach using a teacher-student framework, where a CNN-based teacher model refines a ViT-based student model. To improve efficiency, we integrate Low-Rank Adaptation (LoRA) for fine-tuning, reducing computational costs while maintaining high detection accuracy. Extensive experiments are conducted on a morphing dataset built from three publicly available face datasets, incorporating ten different morphing generation algorithms to assess robustness. The proposed method is benchmarked against six state-of-the-art S-MAD techniques, demonstrating superior detection performance and computational efficiency.
