Table of Contents
Fetching ...

ColFigPhotoAttnNet: Reliable Finger Photo Presentation Attack Detection Leveraging Window-Attention on Color Spaces

Anudeep Vurity, Emanuela Marasco, Raghavendra Ramachandra, Jongwoo Park

TL;DR

This work tackles the cross-device robustness problem in finger photo presentation attack detection by introducing ColFigPhotoAttnNet, a hybrid architecture that processes RGB, HSV, and YCbCr color spaces in parallel via MobileNet V3 backbones. It employs window-based self-attention within 7×7 local regions to capture localized color-space relationships, followed by a Nested Residual Block predictor and 8-bit dynamic quantization for mobile deployment. The framework is evaluated on three finger-photo datasets across inter- and intra-capture settings, demonstrating superior generalization compared to state-of-the-art CNNs and transformers, and revealing the benefits of multi-color-space fusion alongside the trade-offs introduced by quantization. The results emphasize the impact of capture-device evolution on PAD performance and highlight ColFigPhotoAttnNet as a practical, efficient solution for robust, device-agnostic finger photo PAD in real-world mobile security contexts.

Abstract

Finger photo Presentation Attack Detection (PAD) can significantly strengthen smartphone device security. However, these algorithms are trained to detect certain types of attacks. Furthermore, they are designed to operate on images acquired by specific capture devices, leading to poor generalization and a lack of robustness in handling the evolving nature of mobile hardware. The proposed investigation is the first to systematically analyze the performance degradation of existing deep learning PAD systems, convolutional and transformers, in cross-capture device settings. In this paper, we introduce the ColFigPhotoAttnNet architecture designed based on window attention on color channels, followed by the nested residual network as the predictor to achieve a reliable PAD. Extensive experiments using various capture devices, including iPhone13 Pro, GooglePixel 3, Nokia C5, and OnePlusOne, were carried out to evaluate the performance of proposed and existing methods on three publicly available databases. The findings underscore the effectiveness of our approach.

ColFigPhotoAttnNet: Reliable Finger Photo Presentation Attack Detection Leveraging Window-Attention on Color Spaces

TL;DR

This work tackles the cross-device robustness problem in finger photo presentation attack detection by introducing ColFigPhotoAttnNet, a hybrid architecture that processes RGB, HSV, and YCbCr color spaces in parallel via MobileNet V3 backbones. It employs window-based self-attention within 7×7 local regions to capture localized color-space relationships, followed by a Nested Residual Block predictor and 8-bit dynamic quantization for mobile deployment. The framework is evaluated on three finger-photo datasets across inter- and intra-capture settings, demonstrating superior generalization compared to state-of-the-art CNNs and transformers, and revealing the benefits of multi-color-space fusion alongside the trade-offs introduced by quantization. The results emphasize the impact of capture-device evolution on PAD performance and highlight ColFigPhotoAttnNet as a practical, efficient solution for robust, device-agnostic finger photo PAD in real-world mobile security contexts.

Abstract

Finger photo Presentation Attack Detection (PAD) can significantly strengthen smartphone device security. However, these algorithms are trained to detect certain types of attacks. Furthermore, they are designed to operate on images acquired by specific capture devices, leading to poor generalization and a lack of robustness in handling the evolving nature of mobile hardware. The proposed investigation is the first to systematically analyze the performance degradation of existing deep learning PAD systems, convolutional and transformers, in cross-capture device settings. In this paper, we introduce the ColFigPhotoAttnNet architecture designed based on window attention on color channels, followed by the nested residual network as the predictor to achieve a reliable PAD. Extensive experiments using various capture devices, including iPhone13 Pro, GooglePixel 3, Nokia C5, and OnePlusOne, were carried out to evaluate the performance of proposed and existing methods on three publicly available databases. The findings underscore the effectiveness of our approach.

Paper Structure

This paper contains 11 sections, 4 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: On the left, it shows the bona fide and attack capture mechanisms. On the right, there are examples of each scenario.
  • Figure 2: The architecture integrates MobileNet-V3 for feature extraction and applies pointwise convolution within a bottleneck framework with window attention mechanisms using fine-tuned Swin transformer weights. Then, features of three color spaces are combined with element-wise addition and pointwise convolution and fed in a Nested Residual Block that has been initialized with ResNet34 weights. Finally, at inference, the model applies Dynamic Quantization and gives the final global decision.
  • Figure 3: DET curves showing the performance of the models on iPhone 13 Pro and Google Pixel 3 capture devices
  • Figure 4: Boxplot showing the interoperability of the models
  • Figure 5: No. of parameters vs GMACs