Table of Contents
Fetching ...

Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study

Kevin Hernandez-Diaz, Josef Bigun, Fernando Alonso-Fernandez

TL;DR

This work identifies a core limitation of CNNs in extracting orientation features from grayscale inputs and addresses it by injecting orientation-rich signals through Complex Structure Tensor (CST) descriptors. CST succinctly encodes local power-spectrum orientation moments and is computed via a mini complex conv-net with three separable filters, enabling CNNs to leverage orientation cues upfront. Across six well-known CNNs and two periocular datasets (Cross-Eyed and PolyU) in NIR and VIS spectra, CST inputs consistently improve identification accuracy and reduce EER, while enabling substantial network compression and faster convergence. The findings suggest practical benefits for biometric systems on thin clients and motivate future integration with learnable hyperparameters and hybrid architectures such as vision transformers.

Abstract

Our study provides evidence that CNNs struggle to effectively extract orientation features. We show that the use of Complex Structure Tensor, which contains compact orientation features with certainties, as input to CNNs consistently improves identification accuracy compared to using grayscale inputs alone. Experiments also demonstrated that our inputs, which were provided by mini complex conv-nets, combined with reduced CNN sizes, outperformed full-fledged, prevailing CNN architectures. This suggests that the upfront use of orientation features in CNNs, a strategy seen in mammalian vision, not only mitigates their limitations but also enhances their explainability and relevance to thin-clients. Experiments were done on publicly available data sets comprising periocular images for biometric identification and verification (Close and Open World) using 6 State of the Art CNN architectures. We reduced SOA Equal Error Rate (EER) on the PolyU dataset by 5-26% depending on data and scenario.

Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study

TL;DR

This work identifies a core limitation of CNNs in extracting orientation features from grayscale inputs and addresses it by injecting orientation-rich signals through Complex Structure Tensor (CST) descriptors. CST succinctly encodes local power-spectrum orientation moments and is computed via a mini complex conv-net with three separable filters, enabling CNNs to leverage orientation cues upfront. Across six well-known CNNs and two periocular datasets (Cross-Eyed and PolyU) in NIR and VIS spectra, CST inputs consistently improve identification accuracy and reduce EER, while enabling substantial network compression and faster convergence. The findings suggest practical benefits for biometric systems on thin clients and motivate future integration with learnable hyperparameters and hybrid architectures such as vision transformers.

Abstract

Our study provides evidence that CNNs struggle to effectively extract orientation features. We show that the use of Complex Structure Tensor, which contains compact orientation features with certainties, as input to CNNs consistently improves identification accuracy compared to using grayscale inputs alone. Experiments also demonstrated that our inputs, which were provided by mini complex conv-nets, combined with reduced CNN sizes, outperformed full-fledged, prevailing CNN architectures. This suggests that the upfront use of orientation features in CNNs, a strategy seen in mammalian vision, not only mitigates their limitations but also enhances their explainability and relevance to thin-clients. Experiments were done on publicly available data sets comprising periocular images for biometric identification and verification (Close and Open World) using 6 State of the Art CNN architectures. We reduced SOA Equal Error Rate (EER) on the PolyU dataset by 5-26% depending on data and scenario.
Paper Structure (15 sections, 4 equations, 8 figures, 5 tables)

This paper contains 15 sections, 4 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: a),b): Examples of 1-folded (linearly) symmetric textures of planar waves with $I_{20}$ and gradient vectors; c) dito but 2-folded symmetric, with $n=2$. The angle of $I_{20}$ (red arrows) is twice that of the gradient (blue arrows). FT magnitudes are below.
  • Figure 2: Pipeline of the proposed method.
  • Figure 3: Effect of hyperparameters in the CST output, shown in HSV-colors where Hue is modulated by $\angle I_{20}$ (i.e. $0^\circ$ is mapped to red), saturation by $I_{11}$, and Value by $|I_{20}|$. When the latter two are high (vivid colors), certainty is high.
  • Figure 4: Example images from the databases employed.
  • Figure 5: Validation loss during training for a randomly initialized ResNet50 network using different input data.
  • ...and 3 more figures