Table of Contents
Fetching ...

Boosting Unconstrained Face Recognition with Targeted Style Adversary

Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Seyed Rasoul Hosseini, Nasser M. Nasrabadi

TL;DR

This work tackles domain gaps in unconstrained face recognition by introducing Targeted Style Adversary (TSA), a lightweight, hidden-space augmentation that blends labeled and unlabeled feature statistics to create challenging yet plausible styles. TSA incorporates a recognizability constraint via an entropy-based measure to avoid unrecognizable augmentations and uses a gradient-driven objective to balance FR loss with recognizability, yielding efficient, scalable training without image-space generative models. Empirical results across TinyFace, IJB-B, IJB-C, IJB-S, SCFace, and cross-resolution benchmarks show consistent gains and even state-of-the-art-like performance, along with substantial speedups (~70%) and memory reductions. The approach is orthogonal to angular-margin losses, improving their effectiveness while enabling practical deployment in large-scale FR pipelines.

Abstract

While deep face recognition models have demonstrated remarkable performance, they often struggle on the inputs from domains beyond their training data. Recent attempts aim to expand the training set by relying on computationally expensive and inherently challenging image-space augmentation of image generation modules. In an orthogonal direction, we present a simple yet effective method to expand the training data by interpolating between instance-level feature statistics across labeled and unlabeled sets. Our method, dubbed Targeted Style Adversary (TSA), is motivated by two observations: (i) the input domain is reflected in feature statistics, and (ii) face recognition model performance is influenced by style information. Shifting towards an unlabeled style implicitly synthesizes challenging training instances. We devise a recognizability metric to constraint our framework to preserve the inherent identity-related information of labeled instances. The efficacy of our method is demonstrated through evaluations on unconstrained benchmarks, outperforming or being on par with its competitors while offering nearly a 70\% improvement in training speed and 40\% less memory consumption.

Boosting Unconstrained Face Recognition with Targeted Style Adversary

TL;DR

This work tackles domain gaps in unconstrained face recognition by introducing Targeted Style Adversary (TSA), a lightweight, hidden-space augmentation that blends labeled and unlabeled feature statistics to create challenging yet plausible styles. TSA incorporates a recognizability constraint via an entropy-based measure to avoid unrecognizable augmentations and uses a gradient-driven objective to balance FR loss with recognizability, yielding efficient, scalable training without image-space generative models. Empirical results across TinyFace, IJB-B, IJB-C, IJB-S, SCFace, and cross-resolution benchmarks show consistent gains and even state-of-the-art-like performance, along with substantial speedups (~70%) and memory reductions. The approach is orthogonal to angular-margin losses, improving their effectiveness while enabling practical deployment in large-scale FR pipelines.

Abstract

While deep face recognition models have demonstrated remarkable performance, they often struggle on the inputs from domains beyond their training data. Recent attempts aim to expand the training set by relying on computationally expensive and inherently challenging image-space augmentation of image generation modules. In an orthogonal direction, we present a simple yet effective method to expand the training data by interpolating between instance-level feature statistics across labeled and unlabeled sets. Our method, dubbed Targeted Style Adversary (TSA), is motivated by two observations: (i) the input domain is reflected in feature statistics, and (ii) face recognition model performance is influenced by style information. Shifting towards an unlabeled style implicitly synthesizes challenging training instances. We devise a recognizability metric to constraint our framework to preserve the inherent identity-related information of labeled instances. The efficacy of our method is demonstrated through evaluations on unconstrained benchmarks, outperforming or being on par with its competitors while offering nearly a 70\% improvement in training speed and 40\% less memory consumption.
Paper Structure (18 sections, 11 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 18 sections, 11 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overall pipeline of the proposed method. We synthesize diverse, plausible, and novel styles by combining style information from the labeled and unlabeled samples. We move the labeled samples (yellow) toward unlabeled (blue) instances in the style space while ensuring that we do not remove identity-related information from the final embedding, i.e., push away from unrecognizable clusters. The circles reflect the style information.
  • Figure 2: Effect of UC style information on the FR model. Top: Illustrates the UMAP visualization of instance-wise $\small{\mu}$ and $\small{\sigma}$ (Equations \ref{['mu']},and \ref{['sigma']}) of the output of the 3rd block of the ResNet-50, emphasizing the disparity between the SC (MS1MV2) and UC (WiderFace) datasets. Bottom: The performance of the FR model on the original IJB-B/IJB-C and their style-perturbed version. We swapped the IJB-B/IJB-C style information with those from WiderFace. The results show the FR model's susceptibility to the style of the input.
  • Figure 3: First Stage: the backbone $\small{E = E_2(E_1)}$ is fixed. Two samples from labeled and unlabeled datasets are forwarded to the $\small{E_1}$. Then, a novel style is computed from a combination of labeled and unlabeled style information and finally reconstructs the $\small{\mathbf{h}^{'}}$ with the novel style. Then, the FR loss function, $\small{L_{fr}}$, and the recognizability loss is computed, and their gradient are used to optimize the interpolation coefficients. Second Stage: The labeled data and synthesized feature map, $\small{\mathbf{h}^{'}}$, are used to train the FR model $E$.
  • Figure 4: Left: Histogram of BRISQUE scores for both unrecognizable and recognizable instances from the WiderFace dataset drawn from the final training epoch. Note that a lower BRISQUE score is indicative of higher image quality. This histogram demonstrates that the presented approach to distinguish unrecognizable instances effectively characterizes facial quality. Right: Demonstrating randomly sampled unrecognizable and recognizable images from WiderFace. While it is important to acknowledge that some recognizable instances may be mistakenly classified within the UR cluster, our primary interest lies in the center of this cluster, which is representative of the majority.
  • Figure 5: An ablation study on constraining the adversarial objective function. Lowering the constraint, $\small{\beta < 1.0}$, slightly improves TinyFace but degrades IJB-B performance, while a large $\small{\beta}$ benefits neither IJB-B nor TinyFace.