Exploring Complementarity and Explainability in CNNs for Periocular Verification Across Acquisition Distances
Fernando Alonso-Fernandez, Kevin Hernandez Diaz, Jose M. Buades, Kiran Raja, Josef Bigun
TL;DR
This work addresses periocular verification under varying acquisition distances by comparing three CNNs of increasing complexity (SqueezeNet, MobileNetv2, ResNet50) trained on a large pool of ocular crops from VGGFace2 and evaluated on UBIPr. It evaluates two similarity metrics, applies score-level fusion via logistic regression, and leverages explainability tools (LIME heatmaps and Jensen–Shannon divergence) to analyze attention patterns, revealing complementary regions across networks. The study finds that while ResNet50 is strongest individually, fusing all three networks yields substantial gains, achieving state-of-the-art results on UBIPr and demonstrating that architectural diversity can enhance robustness to distance variations. It also highlights the value of explainability in guiding architectural decisions and fusion strategies, with practical implications for robust periocular biometrics in unconstrained settings.
Abstract
We study the complementarity of different CNNs for periocular verification at different distances on the UBIPr database. We train three architectures of increasing complexity (SqueezeNet, MobileNetv2, and ResNet50) on a large set of eye crops from VGGFace2. We analyse performance with cosine and chi2 metrics, compare different network initialisations, and apply score-level fusion via logistic regression. In addition, we use LIME heatmaps and Jensen-Shannon divergence to compare attention patterns of the CNNs. While ResNet50 consistently performs best individually, the fusion provides substantial gains, especially when combining all three networks. Heatmaps show that networks usually focus on distinct regions of a given image, which explains their complementarity. Our method significantly outperforms previous works on UBIPr, achieving a new state-of-the-art.
