Table of Contents
Fetching ...

Multimodal Biometric Authentication Using Camera-Based PPG and Fingerprint Fusion

Xue Xian Zheng, M. M. Ur Rahma, Bilal Taha, Mudassir Masood, Dimitrios Hatzinakos, Tareq Al-Naffouri

TL;DR

A multimodal biometric system that integrates PPG signals extracted from videos with fingerprint data to enhance the accuracy of user verification and demonstrates the system's superior performance across various evaluation metrics in both single-session and dual-session authentication scenarios.

Abstract

Camera-based photoplethysmography (PPG) obtained from smartphones has shown great promise for personalized healthcare and secure authentication. This paper presents a multimodal biometric system that integrates PPG signals extracted from videos with fingerprint data to enhance the accuracy of user verification. The system requires users to place their fingertip on the camera lens for a few seconds, allowing the capture and processing of unique biometric characteristics. Our approach employs a neural network with two structured state-space model (SSM) encoders to manage the distinct modalities. Fingerprint images are transformed into pixel sequences, and along with segmented PPG waveforms, they are input into the encoders. A cross-modal attention mechanism then extracts refined feature representations, and a distribution-oriented contrastive loss function aligns these features within a unified latent space. Experimental results demonstrate the system's superior performance across various evaluation metrics in both single-session and dual-session authentication scenarios.

Multimodal Biometric Authentication Using Camera-Based PPG and Fingerprint Fusion

TL;DR

A multimodal biometric system that integrates PPG signals extracted from videos with fingerprint data to enhance the accuracy of user verification and demonstrates the system's superior performance across various evaluation metrics in both single-session and dual-session authentication scenarios.

Abstract

Camera-based photoplethysmography (PPG) obtained from smartphones has shown great promise for personalized healthcare and secure authentication. This paper presents a multimodal biometric system that integrates PPG signals extracted from videos with fingerprint data to enhance the accuracy of user verification. The system requires users to place their fingertip on the camera lens for a few seconds, allowing the capture and processing of unique biometric characteristics. Our approach employs a neural network with two structured state-space model (SSM) encoders to manage the distinct modalities. Fingerprint images are transformed into pixel sequences, and along with segmented PPG waveforms, they are input into the encoders. A cross-modal attention mechanism then extracts refined feature representations, and a distribution-oriented contrastive loss function aligns these features within a unified latent space. Experimental results demonstrate the system's superior performance across various evaluation metrics in both single-session and dual-session authentication scenarios.

Paper Structure

This paper contains 18 sections, 8 equations, 1 figure, 3 tables, 1 algorithm.

Figures (1)

  • Figure 1: (Left.) The proposed multimodal biometric system. Users place their index finger on the main camera lens with the flashlight on and start recording. After preprocessing the recording videos, the extracted PPG beat waveform and fingerprint image are sequenced and linearly mapped to embeddings $\mathbf{x}_{u}$ and $\mathbf{x}_{v}$. These embeddings are then fed into encoders $f_{\alpha}(\mathbf{x}_{u};\theta_{\alpha})$ and $f_{\beta}(\mathbf{x}_{u};\theta_{\beta})$, and further processed by multi-head cross-modal attention module to obtain latent representations $\mathbf{u}$ and $\mathbf{v}$. Finally, $\mathbf{u}$ and $\mathbf{v}$ are aligned and fused into $\mathbf{z}$, which is passed to the classifier $f_c(\mathbf{z};\theta_c)$ to make the final decision. (Right.) The main structure of homogeneous encoders. Each encoder consists of $N$ stacks of a deep sequence model, with the core sequence-to-sequence transformation being an SSM, complemented by common neural network components such as residual connections, normalizations, projections, and activations, etc.