Efficient Verification-Based Face Identification

Amit Rozner; Barak Battash; Ofir Lindenbaum; Lior Wolf

Efficient Verification-Based Face Identification

Amit Rozner, Barak Battash, Ofir Lindenbaum, Lior Wolf

TL;DR

The paper tackles efficient face verification on edge devices by replacing a large embedding-based system with a per-identity binary verifier generated by a hypernetwork $h$, which outputs weights $\theta^i$ for a compact on-device model $f(\theta^i)$. Enrollment produces $\theta^i$ from a single image using a frozen backbone $h_{bb}$ and a trainable generator $h_{gen}$, while inference uses only the small edge model, discarding $h$ to minimize computation. A novel training regime combines weighted binary cross-entropy with a norm penalty and introduces K-means Centered Sampling (KCS) to create hard-negative batches, yielding a model with $23{,}000$ parameters and $5\times 10^{6}$ FLOPS that remains competitive across six datasets. The approach demonstrates that re-framing face verification as a personalized, edge-friendly task can dramatically reduce memory and compute requirements while preserving performance, and offers a path to extending the idea to other domains.

Abstract

We study the problem of performing face verification with an efficient neural model $f$. The efficiency of $f$ stems from simplifying the face verification problem from an embedding nearest neighbor search into a binary problem; each user has its own neural network $f$. To allow information sharing between different individuals in the training set, we do not train $f$ directly but instead generate the model weights using a hypernetwork $h$. This leads to the generation of a compact personalized model for face identification that can be deployed on edge devices. Key to the method's success is a novel way of generating hard negatives and carefully scheduling the training objectives. Our model leads to a substantially small $f$ requiring only 23k parameters and 5M floating point operations (FLOPS). We use six face verification datasets to demonstrate that our method is on par or better than state-of-the-art models, with a significantly reduced number of parameters and computational burden. Furthermore, we perform an extensive ablation study to demonstrate the importance of each element in our method.

Efficient Verification-Based Face Identification

TL;DR

The paper tackles efficient face verification on edge devices by replacing a large embedding-based system with a per-identity binary verifier generated by a hypernetwork

, which outputs weights

for a compact on-device model

. Enrollment produces

from a single image using a frozen backbone

and a trainable generator

, while inference uses only the small edge model, discarding

to minimize computation. A novel training regime combines weighted binary cross-entropy with a norm penalty and introduces K-means Centered Sampling (KCS) to create hard-negative batches, yielding a model with

parameters and

FLOPS that remains competitive across six datasets. The approach demonstrates that re-framing face verification as a personalized, edge-friendly task can dramatically reduce memory and compute requirements while preserving performance, and offers a path to extending the idea to other domains.

Abstract

We study the problem of performing face verification with an efficient neural model

. The efficiency of

stems from simplifying the face verification problem from an embedding nearest neighbor search into a binary problem; each user has its own neural network

. To allow information sharing between different individuals in the training set, we do not train

directly but instead generate the model weights using a hypernetwork

. This leads to the generation of a compact personalized model for face identification that can be deployed on edge devices. Key to the method's success is a novel way of generating hard negatives and carefully scheduling the training objectives. Our model leads to a substantially small

requiring only 23k parameters and 5M floating point operations (FLOPS). We use six face verification datasets to demonstrate that our method is on par or better than state-of-the-art models, with a significantly reduced number of parameters and computational burden. Furthermore, we perform an extensive ablation study to demonstrate the importance of each element in our method.

Paper Structure (14 sections, 9 equations, 4 figures, 2 tables)

This paper contains 14 sections, 9 equations, 4 figures, 2 tables.

Introduction
Background
Related Work
Deep Embedding Neural Networks for Facial Recognition
Face Recognition on Edge Devices
HyperNetwork Neural Network Architecture
Method
Enrollment Phase
Inference Phase
Training Phase
Experiments
Results
Discussion
Conclusions

Figures (4)

Figure 1: Illustration of the inference phase of our method. Enrollment is performed once per user $i$, using a single facial image $x^i_{enroll}$. The enrollment output is a set of weights $\theta^i$ of $f$, designed to fit low-power devices. In the streaming phase, face images $x_{stream}$ are presented to $f$, which then decides whether those belong to person $i$. In the stream phase, $h$ is discarded; thus, the main computational burden is mitigated.
Figure 2: Comparing batches obtained with conventional sampling to those obtained with $K$-means centered sampling (KCS). (a, b) a random batch sampled during training, (c, d) a random batch sampled during training when using KCS.
Figure 3: The training phase of our method starts with a one-time enrollment procedure, which is noted by dashed arrows. The weights $\theta$ are generated for all images in the batch $X$. Then, $f$ is generated for each user using their specific weights. The streaming phase, noted by regular arrows, may be performed as often as required using the person-specific model $f$. Each model $f$ receives $X$ as an input and calculates the binary cross entropy loss with respect to each model's user label. The embedding model $h_{bb}$ is not trainable. The trainable part $h_{gen}$ obtains the gradients for the batch of $n\times B$ neural networks.
Figure 4: (a) The top1 accuracy average vs. the number of parameters for five datasets (excluding vgg2-fp) of our models and efficient competing models for face recognition. (b) the same when comparing accuracy and the number of operations in mega-flops.

Efficient Verification-Based Face Identification

TL;DR

Abstract

Efficient Verification-Based Face Identification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)