VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection

Raghavendra Ramachandra; Narayan Vetrekar; Sushma Venkatesh; Savita Nageshker; Jag Mohan Singh; R. S. Gad

VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection

Raghavendra Ramachandra, Narayan Vetrekar, Sushma Venkatesh, Savita Nageshker, Jag Mohan Singh, R. S. Gad

TL;DR

This work tackles face presentation attack detection on smartphones by leveraging 3D point clouds captured with the front camera and preserving spatial structure through voxelization. It introduces VoxAtnNet, a 23-layer residual 3D CNN with a novel attention mechanism operating on a $64 \times 64 \times 64$ occupancy grid, achieving strong discrimination between bona fide faces and 3D PAIs. A new dataset, 3D-PCPA, comprises bona fide, 3D silicone masks, and 3D wrap photo attacks to evaluate on unseen instruments, with experiments showing VoxAtnNet superior performance and generalizability across intra/inter/both protocols. The findings highlight the feasibility and impact of 3D point-cloud-based PAD on smartphones, offering a path toward more robust, device-agnostic biometric security.

Abstract

Facial biometrics are an essential components of smartphones to ensure reliable and trustworthy authentication. However, face biometric systems are vulnerable to Presentation Attacks (PAs), and the availability of more sophisticated presentation attack instruments such as 3D silicone face masks will allow attackers to deceive face recognition systems easily. In this work, we propose a novel Presentation Attack Detection (PAD) algorithm based on 3D point clouds captured using the frontal camera of a smartphone to detect presentation attacks. The proposed PAD algorithm, VoxAtnNet, processes 3D point clouds to obtain voxelization to preserve the spatial structure. Then, the voxelized 3D samples were trained using the novel convolutional attention network to detect PAs on the smartphone. Extensive experiments were carried out on the newly constructed 3D face point cloud dataset comprising bona fide and two different 3D PAIs (3D silicone face mask and wrap photo mask), resulting in 3480 samples. The performance of the proposed method was compared with existing methods to benchmark the detection performance using three different evaluation protocols. The experimental results demonstrate the improved performance of the proposed method in detecting both known and unknown face presentation attacks.

VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection

TL;DR

occupancy grid, achieving strong discrimination between bona fide faces and 3D PAIs. A new dataset, 3D-PCPA, comprises bona fide, 3D silicone masks, and 3D wrap photo attacks to evaluate on unseen instruments, with experiments showing VoxAtnNet superior performance and generalizability across intra/inter/both protocols. The findings highlight the feasibility and impact of 3D point-cloud-based PAD on smartphones, offering a path toward more robust, device-agnostic biometric security.

Abstract

Paper Structure (11 sections, 9 figures, 5 tables)

This paper contains 11 sections, 9 figures, 5 tables.

INTRODUCTION
Proposed VoxAtnNet 3D Face PAD
3D face point cloud presentation attack Dataset (3D-PCPA)
$Bona fide$ subset of 3D-PCPA Database
Presentation attack subsets for 3D-PCPA database
Experiments and Results
Performance evaluation protocol
Results and discussion
Ablation Study
Limitations
Conclusion

Figures (9)

Figure 1: Example of 3D point clouds captured using frontal camera of Apple iPhone 12 Pro (a) Bona fide (b) 3D silicone face mask (c) wrap paper attack (d) print paper attack (e) display attack. It can be noted that, the use of 2D artefacts like print paper attack and display attack are easy to detect due to the lack of depth information.
Figure 2: Block diagram of the proposed VoxAtnNet for face PAD. The novelty of the VoxAtnNet includes the voxelization and the attention module with skip connections.
Figure 3: Qualitative results of the voxelization of bona fides and PAIs. The voxelization of the bona fide indicated a rich spatial structure (or high quality surface details) compared to both PAIs. The spatial structure of 3D wrap print attacks indicates a poor spatial structure (or poor surface details) compared with a 3D silicone face mask.
Figure 4: Examples 3D point clouds samples from 3D-PCPA dataset corresponding to bona fide, 3D silicone face mask and wrap paper mask.
Figure 5: DET curves showing the detection performance with Intra protocol (best viewed in color). X-Axis indicates the APCER and y-axis indicates the BPCER.
...and 4 more figures

VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection

TL;DR

Abstract

VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)