Table of Contents
Fetching ...

Efficient Verification-Based Face Identification

Amit Rozner, Barak Battash, Ofir Lindenbaum, Lior Wolf

TL;DR

The paper tackles efficient face verification on edge devices by replacing a large embedding-based system with a per-identity binary verifier generated by a hypernetwork $h$, which outputs weights $\theta^i$ for a compact on-device model $f(\theta^i)$. Enrollment produces $\theta^i$ from a single image using a frozen backbone $h_{bb}$ and a trainable generator $h_{gen}$, while inference uses only the small edge model, discarding $h$ to minimize computation. A novel training regime combines weighted binary cross-entropy with a norm penalty and introduces K-means Centered Sampling (KCS) to create hard-negative batches, yielding a model with $23{,}000$ parameters and $5\times 10^{6}$ FLOPS that remains competitive across six datasets. The approach demonstrates that re-framing face verification as a personalized, edge-friendly task can dramatically reduce memory and compute requirements while preserving performance, and offers a path to extending the idea to other domains.

Abstract

We study the problem of performing face verification with an efficient neural model $f$. The efficiency of $f$ stems from simplifying the face verification problem from an embedding nearest neighbor search into a binary problem; each user has its own neural network $f$. To allow information sharing between different individuals in the training set, we do not train $f$ directly but instead generate the model weights using a hypernetwork $h$. This leads to the generation of a compact personalized model for face identification that can be deployed on edge devices. Key to the method's success is a novel way of generating hard negatives and carefully scheduling the training objectives. Our model leads to a substantially small $f$ requiring only 23k parameters and 5M floating point operations (FLOPS). We use six face verification datasets to demonstrate that our method is on par or better than state-of-the-art models, with a significantly reduced number of parameters and computational burden. Furthermore, we perform an extensive ablation study to demonstrate the importance of each element in our method.

Efficient Verification-Based Face Identification

TL;DR

The paper tackles efficient face verification on edge devices by replacing a large embedding-based system with a per-identity binary verifier generated by a hypernetwork , which outputs weights for a compact on-device model . Enrollment produces from a single image using a frozen backbone and a trainable generator , while inference uses only the small edge model, discarding to minimize computation. A novel training regime combines weighted binary cross-entropy with a norm penalty and introduces K-means Centered Sampling (KCS) to create hard-negative batches, yielding a model with parameters and FLOPS that remains competitive across six datasets. The approach demonstrates that re-framing face verification as a personalized, edge-friendly task can dramatically reduce memory and compute requirements while preserving performance, and offers a path to extending the idea to other domains.

Abstract

We study the problem of performing face verification with an efficient neural model . The efficiency of stems from simplifying the face verification problem from an embedding nearest neighbor search into a binary problem; each user has its own neural network . To allow information sharing between different individuals in the training set, we do not train directly but instead generate the model weights using a hypernetwork . This leads to the generation of a compact personalized model for face identification that can be deployed on edge devices. Key to the method's success is a novel way of generating hard negatives and carefully scheduling the training objectives. Our model leads to a substantially small requiring only 23k parameters and 5M floating point operations (FLOPS). We use six face verification datasets to demonstrate that our method is on par or better than state-of-the-art models, with a significantly reduced number of parameters and computational burden. Furthermore, we perform an extensive ablation study to demonstrate the importance of each element in our method.
Paper Structure (14 sections, 9 equations, 4 figures, 2 tables)

This paper contains 14 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of the inference phase of our method. Enrollment is performed once per user $i$, using a single facial image $x^i_{enroll}$. The enrollment output is a set of weights $\theta^i$ of $f$, designed to fit low-power devices. In the streaming phase, face images $x_{stream}$ are presented to $f$, which then decides whether those belong to person $i$. In the stream phase, $h$ is discarded; thus, the main computational burden is mitigated.
  • Figure 2: Comparing batches obtained with conventional sampling to those obtained with $K$-means centered sampling (KCS). (a, b) a random batch sampled during training, (c, d) a random batch sampled during training when using KCS.
  • Figure 3: The training phase of our method starts with a one-time enrollment procedure, which is noted by dashed arrows. The weights $\theta$ are generated for all images in the batch $X$. Then, $f$ is generated for each user using their specific weights. The streaming phase, noted by regular arrows, may be performed as often as required using the person-specific model $f$. Each model $f$ receives $X$ as an input and calculates the binary cross entropy loss with respect to each model's user label. The embedding model $h_{bb}$ is not trainable. The trainable part $h_{gen}$ obtains the gradients for the batch of $n\times B$ neural networks.
  • Figure 4: (a) The top1 accuracy average vs. the number of parameters for five datasets (excluding vgg2-fp) of our models and efficient competing models for face recognition. (b) the same when comparing accuracy and the number of operations in mega-flops.