Table of Contents
Fetching ...

LEADER: Lightweight End-to-End Attention-Gated Dual Autoencoder for Robust Minutiae Extraction

Raffaele Cappelli, Matteo Ferrara

TL;DR

This paper introduces LEADER (Lightweight End-to-end Attention-gated Dual autoencodER), a neural network that maps raw fingerprint images to minutiae descriptors, including location, direction, and type, using only 0.9M parameters.

Abstract

Minutiae extraction, a fundamental stage in fingerprint recognition, is increasingly shifting toward deep learning. However, truly end-to-end methods that eliminate separate preprocessing and postprocessing steps remain scarce. This paper introduces LEADER (Lightweight End-to-end Attention-gated Dual autoencodER), a neural network that maps raw fingerprint images to minutiae descriptors, including location, direction, and type. The proposed architecture integrates non-maximum suppression and angular decoding to enable complete end-to-end inference using only 0.9M parameters. It employs a novel "Castle-Moat-Rampart" ground-truth encoding and a dual-autoencoder structure, interconnected through an attention-gating mechanism. Experimental evaluations demonstrate state-of-the-art accuracy on plain fingerprints and robust cross-domain generalization to latent impressions. Specifically, LEADER attains a 34% higher F1-score on the NIST SD27 dataset compared to specialized latent minutiae extractors. Sample-level analysis on this challenging benchmark reveals an average rank of 2.07 among all compared methods, with LEADER securing the first-place position in 47% of the samples-more than doubling the frequency of the second-best extractor. The internal representations learned by the model align with established fingerprint domain features, such as segmentation masks, orientation fields, frequency maps, and skeletons. Inference requires 15ms on GPU and 322ms on CPU, outperforming leading commercial software in computational efficiency. The source code and pre-trained weights are publicly released to facilitate reproducibility.

LEADER: Lightweight End-to-End Attention-Gated Dual Autoencoder for Robust Minutiae Extraction

TL;DR

This paper introduces LEADER (Lightweight End-to-end Attention-gated Dual autoencodER), a neural network that maps raw fingerprint images to minutiae descriptors, including location, direction, and type, using only 0.9M parameters.

Abstract

Minutiae extraction, a fundamental stage in fingerprint recognition, is increasingly shifting toward deep learning. However, truly end-to-end methods that eliminate separate preprocessing and postprocessing steps remain scarce. This paper introduces LEADER (Lightweight End-to-end Attention-gated Dual autoencodER), a neural network that maps raw fingerprint images to minutiae descriptors, including location, direction, and type. The proposed architecture integrates non-maximum suppression and angular decoding to enable complete end-to-end inference using only 0.9M parameters. It employs a novel "Castle-Moat-Rampart" ground-truth encoding and a dual-autoencoder structure, interconnected through an attention-gating mechanism. Experimental evaluations demonstrate state-of-the-art accuracy on plain fingerprints and robust cross-domain generalization to latent impressions. Specifically, LEADER attains a 34% higher F1-score on the NIST SD27 dataset compared to specialized latent minutiae extractors. Sample-level analysis on this challenging benchmark reveals an average rank of 2.07 among all compared methods, with LEADER securing the first-place position in 47% of the samples-more than doubling the frequency of the second-best extractor. The internal representations learned by the model align with established fingerprint domain features, such as segmentation masks, orientation fields, frequency maps, and skeletons. Inference requires 15ms on GPU and 322ms on CPU, outperforming leading commercial software in computational efficiency. The source code and pre-trained weights are publicly released to facilitate reproducibility.
Paper Structure (37 sections, 8 equations, 21 figures, 9 tables)

This paper contains 37 sections, 8 equations, 21 figures, 9 tables.

Figures (21)

  • Figure 1: Overview of fingerprint patterns and extraction paradigms. (a) Local ridge structure highlighting a ridge ($\mathrm{a}_1$), a valley ($\mathrm{a}_2$), a ridge ending ($\mathrm{a}_3$), and a bifurcation ($\mathrm{a}_4$), with their corresponding directions $\theta$. (b) Conventional deep learning-based pipeline: the input image ($\mathrm{b}_1$) is typically preprocessed ($\mathrm{b}_2$) and fed into a neural network to generate heatmaps ($\mathrm{b}_3$), requiring off-graph postprocessing to obtain the final minutiae ($\mathrm{b}_4$). (c) Traditional extraction workflow: the input image ($\mathrm{c}_1$) is enhanced ($\mathrm{c}_2$), binarized ($\mathrm{c}_3$), and thinned to a ridge skeleton ($\mathrm{c}_4$) before connectivity analysis yields the minutiae ($\mathrm{c}_5$).
  • Figure 2: High-level diagram of the LEADER model. The network processes the fingerprint $\mathbf{F}$ through a pipeline composed of the Stem, the Context-Autoencoder, the Attention-Gate, the Refinement-Autoencoder, and the multi-task Head producing position ($\widehat{\mathbf{P}}, \widetilde{\mathbf{P}}$), direction ($\widehat{\mathbf{D}}$), and type ($\widehat{\mathbf{T}}$) maps.
  • Figure 3: Ground-truth encoding and adaptive spatial weighting. (a) Detail of a fingerprint image with annotated minutiae. (b) Generated binary position map $\mathbf{P}$: the morphology of the positive components is dynamically constrained in high-density regions to prevent spatial overlap. (c) Corresponding weight map $\mathbf{W}$, derived from the CMR geometry. The penalty zones adaptively reshape to maintain separation between closely spaced minutiae; the castle regions in $\mathbf{W}$ correspond to positive pixels in $\mathbf{P}$, while the ramparts provide localized penalty peaks to enforce sharp localization.
  • Figure 4: Radial profile of the CMR weighting strategy. Left: 2D weight map $\mathbf{W}$ for an isolated minutia. Right: Profile of the weighting function $\omega(\rho_{\min}(\cdot))$ along a cross-section from the center (pixels are highlighted in red). The castle ($w_{ij}=1$) covers the central region—nominally of radius $\delta$, but dynamically adjusted in high-density areas—followed by a zero-gradient moat of width $\beta$, which effectively extends into the lower tail of the Gaussian slope. The rampart rises to a peak penalty ($w_{ij}=1$) to enforce sharp localization before descending to the background plateau $\lambda$.
  • Figure 5: Representative fingerprint samples from the datasets. The collection covers diverse acquisition technologies: (a) FVC2002 DB2-A (Optical); (b) FVC2002 DB3-A (Capacitive); (c) FVC2004 DB1-A (Optical); (d) FVC2004 DB3-A (Thermal sweeping); (e) FFE Good (Optical); (f) FFE Bad (Optical); (g) FVC2000 DB2-A (Capacitive); (h) FVC2002 DB1-A (Optical); (i) NIST SD27 (Latent).
  • ...and 16 more figures