Table of Contents
Fetching ...

AttFC: Attention Fully-Connected Layer for Large-Scale Face Recognition with One GPU

Zhuowen Zheng, Yain-Whar Si, Xiaochen Yuan, Junwei Duan, Ke Wang, Xiaofan Li, Xinyuan Zhang, Xueyuan Gong

TL;DR

The paper tackles the memory and compute bottlenecks of the fully connected head in large-scale face recognition by introducing AttFC, which replaces the FC with a Dynamic Class Container that stores a small, rotating set of Generative Class Centers generated via an attention loader. A two-encoder architecture, an attention-weighted GCC generator, and a masking mechanism work together to produce GCCs that closely resemble the true class centers, while significantly reducing parameters and enabling training on a single GPU. Empirical results on MS1MV3 and WebFace-21M show AttFC delivers substantial resource savings with accuracy comparable to or better than strong baselines, demonstrating scalable FR on commodity hardware. The work offers a practical path toward deploying FR systems on ultra-large identity sets and invites future improvements to generate GCCs from even fewer images or a single image without sacrificing performance.

Abstract

Nowadays, with the advancement of deep neural networks (DNNs) and the availability of large-scale datasets, the face recognition (FR) model has achieved exceptional performance. However, since the parameter magnitude of the fully connected (FC) layer directly depends on the number of identities in the dataset. If training the FR model on large-scale datasets, the size of the model parameter will be excessively huge, leading to substantial demand for computational resources, such as time and memory. This paper proposes the attention fully connected (AttFC) layer, which could significantly reduce computational resources. AttFC employs an attention loader to generate the generative class center (GCC), and dynamically store the class center with Dynamic Class Container (DCC). DCC only stores a small subset of all class centers in FC, thus its parameter count is substantially less than the FC layer. Also, training face recognition models on large-scale datasets with one GPU often encounter out-of-memory (OOM) issues. AttFC overcomes this and achieves comparable performance to state-of-the-art methods.

AttFC: Attention Fully-Connected Layer for Large-Scale Face Recognition with One GPU

TL;DR

The paper tackles the memory and compute bottlenecks of the fully connected head in large-scale face recognition by introducing AttFC, which replaces the FC with a Dynamic Class Container that stores a small, rotating set of Generative Class Centers generated via an attention loader. A two-encoder architecture, an attention-weighted GCC generator, and a masking mechanism work together to produce GCCs that closely resemble the true class centers, while significantly reducing parameters and enabling training on a single GPU. Empirical results on MS1MV3 and WebFace-21M show AttFC delivers substantial resource savings with accuracy comparable to or better than strong baselines, demonstrating scalable FR on commodity hardware. The work offers a practical path toward deploying FR systems on ultra-large identity sets and invites future improvements to generate GCCs from even fewer images or a single image without sacrificing performance.

Abstract

Nowadays, with the advancement of deep neural networks (DNNs) and the availability of large-scale datasets, the face recognition (FR) model has achieved exceptional performance. However, since the parameter magnitude of the fully connected (FC) layer directly depends on the number of identities in the dataset. If training the FR model on large-scale datasets, the size of the model parameter will be excessively huge, leading to substantial demand for computational resources, such as time and memory. This paper proposes the attention fully connected (AttFC) layer, which could significantly reduce computational resources. AttFC employs an attention loader to generate the generative class center (GCC), and dynamically store the class center with Dynamic Class Container (DCC). DCC only stores a small subset of all class centers in FC, thus its parameter count is substantially less than the FC layer. Also, training face recognition models on large-scale datasets with one GPU often encounter out-of-memory (OOM) issues. AttFC overcomes this and achieves comparable performance to state-of-the-art methods.

Paper Structure

This paper contains 11 sections, 12 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: (a). The size of common large-scale face datasets. (b).The number of parameters in the backbone network and FC layer. The dimension of the FC layer $W = D \times N$ is related to the number of identities $N$, and $D$ is the dimension of features (in this case $D$ equals 512).
  • Figure 2: Comparison of different strategies for generating GCC: (a) Generating GCC with one image: The GCC from low-quality images may be dissimilar to TCC and other normal features. (b) Generating GCC with multiple images: The GCC is as similar as possible to the image features, but it may deviate further from TCC if the contribution of low-quality images is too large. (c) Generating GCC with multiple images based on attention weight: The contribution of low-quality images is decreased, thus the GCC is closer to TCC.
  • Figure 3: The architecture of AttFC. AttFC uses a feature encoder and class encoder to obtain the identity feature and class features respectively. The attention loader will generate the GCC with these features and store it in DCC. DCC works as an FC layer for calculating the loss. Finally, the feature encoder is updated with SGD, while the class encoder will be momentum updated by the feature encoder.
  • Figure 4: If the class center conflict occurs during training, it is necessary to mask the superfluous $w^+_{cft}$, ensuring that the probability of $f$ belonging to the class center $w^+_{cft}$ is 0.