Towards Iris Presentation Attack Detection with Foundation Models
Juan E. Tapia, Lázaro Janier González-Soler, Christoph Busch
TL;DR
This paper investigates iris Presentation Attack Detection (PAD) using foundation models to address data scarcity and cross-domain generalization. By evaluating DinoV2 and VisualOpenClip backbones with a lightweight classification head, the study demonstrates that fine-tuning can surpass traditional deep learning approaches, achieving strong metrics such as an EER of 6.77% and a BPCER10 of 3.38% for DinoV2-ViTB14, while VisualOpenClip also performs competitively. In a complementary scratch-based setting, conventional CNNs like DenseNet121 can outperform foundation models when ample bona fide and attack data are available. The findings highlight the practicality of foundation-model-based iris PAD for data-limited regimes and suggest that, with sufficient data, traditional training can still be advantageous. Inference remains on-device, preserving privacy by avoiding cloud-based data exposure.
Abstract
Foundation models are becoming increasingly popular due to their strong generalization capabilities resulting from being trained on huge datasets. These generalization capabilities are attractive in areas such as NIR Iris Presentation Attack Detection (PAD), in which databases are limited in the number of subjects and diversity of attack instruments, and there is no correspondence between the bona fide and attack images because, most of the time, they do not belong to the same subjects. This work explores an iris PAD approach based on two foundation models, DinoV2 and VisualOpenClip. The results show that fine-tuning prediction with a small neural network as head overpasses the state-of-the-art performance based on deep learning approaches. However, systems trained from scratch have still reached better results if bona fide and attack images are available.
