Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study
Kevin Hernandez-Diaz, Josef Bigun, Fernando Alonso-Fernandez
TL;DR
This work identifies a core limitation of CNNs in extracting orientation features from grayscale inputs and addresses it by injecting orientation-rich signals through Complex Structure Tensor (CST) descriptors. CST succinctly encodes local power-spectrum orientation moments and is computed via a mini complex conv-net with three separable filters, enabling CNNs to leverage orientation cues upfront. Across six well-known CNNs and two periocular datasets (Cross-Eyed and PolyU) in NIR and VIS spectra, CST inputs consistently improve identification accuracy and reduce EER, while enabling substantial network compression and faster convergence. The findings suggest practical benefits for biometric systems on thin clients and motivate future integration with learnable hyperparameters and hybrid architectures such as vision transformers.
Abstract
Our study provides evidence that CNNs struggle to effectively extract orientation features. We show that the use of Complex Structure Tensor, which contains compact orientation features with certainties, as input to CNNs consistently improves identification accuracy compared to using grayscale inputs alone. Experiments also demonstrated that our inputs, which were provided by mini complex conv-nets, combined with reduced CNN sizes, outperformed full-fledged, prevailing CNN architectures. This suggests that the upfront use of orientation features in CNNs, a strategy seen in mammalian vision, not only mitigates their limitations but also enhances their explainability and relevance to thin-clients. Experiments were done on publicly available data sets comprising periocular images for biometric identification and verification (Close and Open World) using 6 State of the Art CNN architectures. We reduced SOA Equal Error Rate (EER) on the PolyU dataset by 5-26% depending on data and scenario.
