Table of Contents
Fetching ...

EdgeEar: Efficient and Accurate Ear Recognition for Edge Devices

Camile Lendering, Bernardo Perrone Ribeiro, Žiga Emeršič, Peter Peer

TL;DR

EdgeEar tackles the need for accurate ear recognition on resource-limited edge devices by presenting a lightweight hybrid CNN–Transformer model that incorporates selective low-rank linear layers (LoRaLin) within SDTA modules. With fewer than $2\ \mathrm{M}$ parameters, EdgeEar achieves a competitive $EER=0.143$, $AUC=0.904$, and $R1=0.929$ on the UERC2023 benchmark, while reducing computational cost dramatically. The method introduces a targeted replacement strategy for the QKV projections and demonstrates through ablations that CE with label smoothing best preserves performance. This work shows that compact, edge-optimized ear biometrics are feasible and highlights the importance of larger, diverse datasets to reduce demographic bias and advance real-world deployment.

Abstract

Ear recognition is a contactless and unobtrusive biometric technique with applications across various domains. However, deploying high-performing ear recognition models on resource-constrained devices is challenging, limiting their applicability and widespread adoption. This paper introduces EdgeEar, a lightweight model based on a proposed hybrid CNN-transformer architecture to solve this problem. By incorporating low-rank approximations into specific linear layers, EdgeEar reduces its parameter count by a factor of 50 compared to the current state-of-the-art, bringing it below two million while maintaining competitive accuracy. Evaluation on the Unconstrained Ear Recognition Challenge (UERC2023) benchmark shows that EdgeEar achieves the lowest EER while significantly reducing computational costs. These findings demonstrate the feasibility of efficient and accurate ear recognition, which we believe will contribute to the wider adoption of ear biometrics.

EdgeEar: Efficient and Accurate Ear Recognition for Edge Devices

TL;DR

EdgeEar tackles the need for accurate ear recognition on resource-limited edge devices by presenting a lightweight hybrid CNN–Transformer model that incorporates selective low-rank linear layers (LoRaLin) within SDTA modules. With fewer than parameters, EdgeEar achieves a competitive , , and on the UERC2023 benchmark, while reducing computational cost dramatically. The method introduces a targeted replacement strategy for the QKV projections and demonstrates through ablations that CE with label smoothing best preserves performance. This work shows that compact, edge-optimized ear biometrics are feasible and highlights the importance of larger, diverse datasets to reduce demographic bias and advance real-world deployment.

Abstract

Ear recognition is a contactless and unobtrusive biometric technique with applications across various domains. However, deploying high-performing ear recognition models on resource-constrained devices is challenging, limiting their applicability and widespread adoption. This paper introduces EdgeEar, a lightweight model based on a proposed hybrid CNN-transformer architecture to solve this problem. By incorporating low-rank approximations into specific linear layers, EdgeEar reduces its parameter count by a factor of 50 compared to the current state-of-the-art, bringing it below two million while maintaining competitive accuracy. Evaluation on the Unconstrained Ear Recognition Challenge (UERC2023) benchmark shows that EdgeEar achieves the lowest EER while significantly reducing computational costs. These findings demonstrate the feasibility of efficient and accurate ear recognition, which we believe will contribute to the wider adoption of ear biometrics.

Paper Structure

This paper contains 15 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Comparison plot of Equal Error Rate (EER) versus the number of parameters (in M) for each UERC2023 approach. EdgeEar achieves a new state-of-the-art EER of $0.143$, surpassing the previous best model's EER of $0.146$, while maintaining a significantly smaller parameter count of $1.98M$ compared to $97.0M$.
  • Figure 2: Schematic diagram of the proposed EdgeEar model, adapted from the EdgeFace George_IEEETBIOM_2024 and EdgeNeXt Maaz_edgenext architectures. The modifications include selectively applying LoRaLin layers (highlighted in red) to three layers of the Stage 4 SDTA Encoder, while retaining full-rank linear layers for the remaining layers.
  • Figure 3: ROC curves for ethnicity and gender splits in the sequestered UERC23 dataset. Each curve represents the model's performance for a specific group, with the AUC annotated in the legend.