LatentPrintFormer: A Hybrid CNN-Transformer with Spatial Attention for Latent Fingerprint identification
Arnab Maity, Manasa, Pavan Kumar C, Raghavendra Ramachandra
TL;DR
This work addresses latent fingerprint identification under challenging conditions (noise, background clutter, partial impressions) by introducing LatentPrintFormer, a lightweight hybrid CNN-Transformer that fuses local and global features. The architecture combines EfficientNet-B0 and Swin Tiny with a Spatial Attention Module to produce a $512$-dimensional embedding trained with ArcFace loss and cosine similarity for closed-set matching. Experiments on IIITD-Latent and LFIW show consistent Rank-10 improvements over three state-of-the-art baselines, with ablation studies confirming the contributions of spatial attention and the Transformer branch. The results suggest strong potential for forensic workflows and highlight avenues for robustness across sensors and environments.
Abstract
Latent fingerprint identification remains a challenging task due to low image quality, background noise, and partial impressions. In this work, we propose a novel identification approach called LatentPrintFormer. The proposed model integrates a CNN backbone (EfficientNet-B0) and a Transformer backbone (Swin Tiny) to extract both local and global features from latent fingerprints. A spatial attention module is employed to emphasize high-quality ridge regions while suppressing background noise. The extracted features are fused and projected into a unified 512-dimensional embedding, and matching is performed using cosine similarity in a closed-set identification setting. Extensive experiments on two publicly available datasets demonstrate that LatentPrintFormer consistently outperforms three state-of-the-art latent fingerprint recognition techniques, achieving higher identification rates across Rank-10.
