A Hybrid Deep Learning Model for Robust Biometric Authentication from Low-Frame-Rate PPG Signals
Arfina Rahman, Mahesh Banavar
TL;DR
This work addresses robust biometric authentication from PPG signals captured at low frame rates using monochrome video, where motion and inter-subject variability pose significant challenges. It introduces a scalogram-based time–frequency representation via Continuous Wavelet Transform (CWT) and a hybrid CVT–ConvMixer–LSTM model that jointly learns spatial, spectral, and temporal features. The approach yields strong empirical performance on CFISHR and BIDMC datasets, with the full model achieving an authentication accuracy of $97.68\%$ and AUC of $0.95$, outperforming LSTM and CVT+ConvMixer baselines. The framework is designed for efficiency and on-device deployment, offering inherent liveness detection and practicality for real-world mobile and embedded security applications, with potential extensions to multimodal biometrics and lighter architectures.
Abstract
Photoplethysmography (PPG) signals, which measure changes in blood volume in the skin using light, have recently gained attention in biometric authentication because of their non-invasive acquisition, inherent liveness detection, and suitability for low-cost wearable devices. However, PPG signal quality is challenged by motion artifacts, illumination changes, and inter-subject physiological variability, making robust feature extraction and classification crucial. This study proposes a lightweight and cost-effective biometric authentication framework based on PPG signals extracted from low-frame-rate fingertip videos. The CFIHSR dataset, comprising PPG recordings from 46 subjects at a sampling rate of 14 Hz, is employed for evaluation. The raw PPG signals undergo a standard preprocessing pipeline involving baseline drift removal, motion artifact suppression using Principal Component Analysis (PCA), bandpass filtering, Fourier-based resampling, and amplitude normalization. To generate robust representations, each one-dimensional PPG segment is converted into a two-dimensional time-frequency scalogram via the Continuous Wavelet Transform (CWT), effectively capturing transient cardiovascular dynamics. We developed a hybrid deep learning model, termed CVT-ConvMixer-LSTM, by combining spatial features from the Convolutional Vision Transformer (CVT) and ConvMixer branches with temporal features from a Long Short-Term Memory network (LSTM). The experimental results on 46 subjects demonstrate an authentication accuracy of 98%, validating the robustness of the model to noise and variability between subjects. Due to its efficiency, scalability, and inherent liveness detection capability, the proposed system is well-suited for real-world mobile and embedded biometric security applications.
