Table of Contents
Fetching ...

Local and Global Feature Attention Fusion Network for Face Recognition

Wang Yu, Wei Wei

TL;DR

This work tackles face recognition under low-quality conditions by introducing LGAF, a network that adaptively fuses local and global features guided by a feature-quality proxy. The Local and Global Feature Fusion (LGF) module combines normalized feature energies to allocate attention between local and global cues, while the Multi-Head Multi-Scale Local Feature Extraction (MHMS) module enriches local information across scales. Through rigorous experiments and ablations, LGAF demonstrates robust performance across high- and low-quality datasets, achieving state-of-the-art averages on multiple benchmarks and strong results on TinyFace and SCFace. The approach highlights the importance of feature-quality-aware fusion for stable recognition when faces undergo missing regions or deformation, with practical implications for real-world deployment.

Abstract

Recognition of low-quality face images remains a challenge due to invisible or deformation in partial facial regions. For low-quality images dominated by missing partial facial regions, local region similarity contributes more to face recognition (FR). Conversely, in cases dominated by local face deformation, excessive attention to local regions may lead to misjudgments, while global features exhibit better robustness. However, most of the existing FR methods neglect the bias in feature quality of low-quality images introduced by different factors. To address this issue, we propose a Local and Global Feature Attention Fusion (LGAF) network based on feature quality. The network adaptively allocates attention between local and global features according to feature quality and obtains more discriminative and high-quality face features through local and global information complementarity. In addition, to effectively obtain fine-grained information at various scales and increase the separability of facial features in high-dimensional space, we introduce a Multi-Head Multi-Scale Local Feature Extraction (MHMS) module. Experimental results demonstrate that the LGAF achieves the best average performance on $4$ validation sets (CFP-FP, CPLFW, AgeDB, and CALFW), and the performance on TinyFace and SCFace outperforms the state-of-the-art methods (SoTA).

Local and Global Feature Attention Fusion Network for Face Recognition

TL;DR

This work tackles face recognition under low-quality conditions by introducing LGAF, a network that adaptively fuses local and global features guided by a feature-quality proxy. The Local and Global Feature Fusion (LGF) module combines normalized feature energies to allocate attention between local and global cues, while the Multi-Head Multi-Scale Local Feature Extraction (MHMS) module enriches local information across scales. Through rigorous experiments and ablations, LGAF demonstrates robust performance across high- and low-quality datasets, achieving state-of-the-art averages on multiple benchmarks and strong results on TinyFace and SCFace. The approach highlights the importance of feature-quality-aware fusion for stable recognition when faces undergo missing regions or deformation, with practical implications for real-world deployment.

Abstract

Recognition of low-quality face images remains a challenge due to invisible or deformation in partial facial regions. For low-quality images dominated by missing partial facial regions, local region similarity contributes more to face recognition (FR). Conversely, in cases dominated by local face deformation, excessive attention to local regions may lead to misjudgments, while global features exhibit better robustness. However, most of the existing FR methods neglect the bias in feature quality of low-quality images introduced by different factors. To address this issue, we propose a Local and Global Feature Attention Fusion (LGAF) network based on feature quality. The network adaptively allocates attention between local and global features according to feature quality and obtains more discriminative and high-quality face features through local and global information complementarity. In addition, to effectively obtain fine-grained information at various scales and increase the separability of facial features in high-dimensional space, we introduce a Multi-Head Multi-Scale Local Feature Extraction (MHMS) module. Experimental results demonstrate that the LGAF achieves the best average performance on validation sets (CFP-FP, CPLFW, AgeDB, and CALFW), and the performance on TinyFace and SCFace outperforms the state-of-the-art methods (SoTA).

Paper Structure

This paper contains 23 sections, 13 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Examples of low-quality face images caused by different factors. Each factor is illustrated using two contrasting face images. These factors generally increase the difficulty of FR.
  • Figure 2: The framework of the proposed Local and Global Feature Attention Fusion (LGAF) network. GAP represents a global average pooling operation, $a\times a$ conv represents a convolution operation with kernel size $a$, FC stands for a fully connected layer, and FQA represents a feature quality assessment module. The MHMS and GFE are responsible for extracting effective local features and global features, respectively. The LGF module adaptively allocates attention between local and global features and performs feature fusion. The MSNet utilizes LANet and SE Module to extract local features at various scales.
  • Figure 3: (a) Examples of feature norms at different yaw angles. Comments below images represent yaw angle (local feature norm; global feature norm). (b) Examples of feature norms at different expression. Comments below images represent expression (local feature norm; global feature norm). (c) Examples of feature norm under different motion blur intensities. Comments below images represent moving interval pixel length (local feature norm; global feature norm).
  • Figure 4: Examples of the mean and standard deviation of the feature norm under different poses, ages, and motion blur intensities, as well as the correlation scores between feature norm and pose, age, and motion blur. (a) Local and global average feature norm at absolute yaw angle between 0 and 90 degrees. (b) Local and global average feature norm under different expression. (c) Local and global average feature norm under varying motion blur intensities. (d) The correlation scores between influencing factors of low-quality face images and feature norm.
  • Figure 5: Feature norm distribution statistics and partial sample visualization on TinyFace.
  • ...and 1 more figures