Table of Contents
Fetching ...

SpectraIrisPAD: Leveraging Vision Foundation Models for Spectrally Conditioned Multispectral Iris Presentation Attack Detection

Raghavendra Ramachandra, Sushma Venkatesh

TL;DR

SpectraIrisPAD addresses the vulnerability of iris recognition to presentation attacks by using multispectral imaging across five NIR bands and a DINOv2-based transformer with spectral conditioning. The framework introduces spectral positional encoding, token fusion, band-specific dropout, and a mask-aware ensemble to robustly fuse band information, achieving superior generalization to unseen PAIs. A new MSIrPAD dataset with 18,848 iris samples across five bands and eight attack instruments enables rigorous cross-artefact evaluation, where SpectraIrisPAD consistently outperforms state-of-the-art baselines. The work demonstrates the practical viability of spectral diversity and transformer-based representations for secure, real-world iris PAD, and provides a foundation for future cross-sensor and uncertainty-aware extensions.

Abstract

Iris recognition is widely recognized as one of the most accurate biometric modalities. However, its growing deployment in real-world applications raises significant concerns regarding its vulnerability to Presentation Attacks (PAs). Effective Presentation Attack Detection (PAD) is therefore critical to ensure the integrity and security of iris-based biometric systems. While conventional iris recognition systems predominantly operate in the near-infrared (NIR) spectrum, multispectral imaging across multiple NIR bands provides complementary reflectance information that can enhance the generalizability of PAD methods. In this work, we propose \textbf{SpectraIrisPAD}, a novel deep learning-based framework for robust multispectral iris PAD. The SpectraIrisPAD leverages a DINOv2 Vision Transformer (ViT) backbone equipped with learnable spectral positional encoding, token fusion, and contrastive learning to extract discriminative, band-specific features that effectively distinguish bona fide samples from various spoofing artifacts. Furthermore, we introduce a new comprehensive dataset Multispectral Iris PAD (\textbf{MSIrPAD}) with diverse PAIs, captured using a custom-designed multispectral iris sensor operating at five distinct NIR wavelengths (800\,nm, 830\,nm, 850\,nm, 870\,nm, and 980\,nm). The dataset includes 18,848 iris images encompassing eight diverse PAI categories, including five textured contact lenses, print attacks, and display-based attacks. We conduct comprehensive experiments under unseen attack evaluation protocols to assess the generalization capability of the proposed method. SpectraIrisPAD consistently outperforms several state-of-the-art baselines across all performance metrics, demonstrating superior robustness and generalizability in detecting a wide range of presentation attacks.

SpectraIrisPAD: Leveraging Vision Foundation Models for Spectrally Conditioned Multispectral Iris Presentation Attack Detection

TL;DR

SpectraIrisPAD addresses the vulnerability of iris recognition to presentation attacks by using multispectral imaging across five NIR bands and a DINOv2-based transformer with spectral conditioning. The framework introduces spectral positional encoding, token fusion, band-specific dropout, and a mask-aware ensemble to robustly fuse band information, achieving superior generalization to unseen PAIs. A new MSIrPAD dataset with 18,848 iris samples across five bands and eight attack instruments enables rigorous cross-artefact evaluation, where SpectraIrisPAD consistently outperforms state-of-the-art baselines. The work demonstrates the practical viability of spectral diversity and transformer-based representations for secure, real-world iris PAD, and provides a foundation for future cross-sensor and uncertainty-aware extensions.

Abstract

Iris recognition is widely recognized as one of the most accurate biometric modalities. However, its growing deployment in real-world applications raises significant concerns regarding its vulnerability to Presentation Attacks (PAs). Effective Presentation Attack Detection (PAD) is therefore critical to ensure the integrity and security of iris-based biometric systems. While conventional iris recognition systems predominantly operate in the near-infrared (NIR) spectrum, multispectral imaging across multiple NIR bands provides complementary reflectance information that can enhance the generalizability of PAD methods. In this work, we propose \textbf{SpectraIrisPAD}, a novel deep learning-based framework for robust multispectral iris PAD. The SpectraIrisPAD leverages a DINOv2 Vision Transformer (ViT) backbone equipped with learnable spectral positional encoding, token fusion, and contrastive learning to extract discriminative, band-specific features that effectively distinguish bona fide samples from various spoofing artifacts. Furthermore, we introduce a new comprehensive dataset Multispectral Iris PAD (\textbf{MSIrPAD}) with diverse PAIs, captured using a custom-designed multispectral iris sensor operating at five distinct NIR wavelengths (800\,nm, 830\,nm, 850\,nm, 870\,nm, and 980\,nm). The dataset includes 18,848 iris images encompassing eight diverse PAI categories, including five textured contact lenses, print attacks, and display-based attacks. We conduct comprehensive experiments under unseen attack evaluation protocols to assess the generalization capability of the proposed method. SpectraIrisPAD consistently outperforms several state-of-the-art baselines across all performance metrics, demonstrating superior robustness and generalizability in detecting a wide range of presentation attacks.

Paper Structure

This paper contains 38 sections, 19 equations, 3 figures, 13 tables.

Figures (3)

  • Figure 1: Overview of the proposed SpectraIrisPAD. Each band $b\in\{800,830,850,870,980\}$ is processed with DINOv2 and Spectral Positional Encoding by injecting a learnable embedding $E_b$ into the CLS token to obtain $\mathrm{CLS}^{\mathrm{SPE}}_b$. The CLS and mean patch tokens are fused, followed by band-adaptive dropout (or BandDropout) with probability $p_b$ and a linear classifier to produce the per-band posterior $P_b$. Bands are combined only at the probability level using development accuracy weights $w_b$ to yield the fused prediction $P_{\mathrm{ens}}(x)$. Bands are fused with a mask-aware (missing-band robust) probability-level ensemble, using development-set weights $w_b$. All trainable modules are band-specific with no cross-band weight sharing.
  • Figure 2: Illustration of captured images corresponding to Bona fide and attacks from MSIrPAD dataset.
  • Figure 3: Radar plots showing the cross-artefact generalisation performance of the proposed PAD method and existing baselines in terms of (a) D-EER and (b) HTER across eight artefact types. In each figure, a point corresponding to Artefact #i indicates the performance when the PAD models are trained on Artefact #i and tested on all remaining artefacts ($\forall i = 1, 2, \ldots, 8$). Lower values indicate better generalisation.