Leveraging Large-Scale Face Datasets for Deep Periocular Recognition via Ocular Cropping
Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Josef Bigun
TL;DR
This work investigates periocular recognition by leveraging $1.907572\times 10^6$ ocular crops from VGGFace2 to train three CNN backbones of varying depth ($1.24\times 10^6$, $3.5\times 10^6$, and $25.6\times 10^6$ parameters). It evaluates ocular-based representations on both unconstrained VGGFace2-Pose and controlled UFPR-Periocular datasets, finding EERs in the range $9-15\%$ for VGG2-Pose and $1-2\%$ on UFPR, with fusion of MobileNetv2 and ResNet50 providing notable gains and approaching state-of-the-art on UFPR. The study reveals that ImageNet pretraining generally yields better starting points than fine-tuning face-recognition models for ocular data, and that network fusion can yield robust improvements across conditions. Overall, the results demonstrate the viability of using large-scale face datasets to learn periocular representations and identify practical paths for further gains, such as margin-based losses and cross-dataset sequential fine-tuning.
Abstract
We focus on ocular biometrics, specifically the periocular region (the area around the eye), which offers high discrimination and minimal acquisition constraints. We evaluate three Convolutional Neural Network architectures of varying depth and complexity to assess their effectiveness for periocular recognition. The networks are trained on 1,907,572 ocular crops extracted from the large-scale VGGFace2 database. This significantly contrasts with existing works, which typically rely on small-scale periocular datasets for training having only a few thousand images. Experiments are conducted with ocular images from VGGFace2-Pose, a subset of VGGFace2 containing in-the-wild face images, and the UFPR-Periocular database, which consists of selfies captured via mobile devices with user guidance on the screen. Due to the uncontrolled conditions of VGGFace2, the Equal Error Rates (EERs) obtained with ocular crops range from 9-15%, noticeably higher than the 3-6% EERs achieved using full-face images. In contrast, UFPR-Periocular yields significantly better performance (EERs of 1-2%), thanks to higher image quality and more consistent acquisition protocols. To the best of our knowledge, these are the lowest reported EERs on the UFPR dataset to date.
