Table of Contents
Fetching ...

LH2Face: Loss function for Hard High-quality Face

Fan Xie, Yang Wang, Yikang Jiao, Zhenyu Yuan, Congxi Chen, Chuanxin Zhao

TL;DR

LH2Face addresses the challenge of recognizing hard high-quality faces by introducing a vMF-based similarity with an adaptive margin (Uncertainty-Aware Margin Function), complemented by proxy-based losses to shape the proxy–sample space and a face-reconstruction renderer to inject 3D cues into FR training. The method explicitly ties margin and concentration to sample quality via $m = 0.35\mu_{|\mathbf{z}|}$ and $\kappa = |\mathbf{z}|$, and optimizes the overall loss $\mathcal{L}_{\text{FR}} = \mathcal{L}_{\text{vMF}} + \mathcal{L}_{\text{proxy-based}}$, where $\mathcal{L}_{\text{proxy-based}} = \mathcal{L}_{\text{pps}} + \mathcal{L}_{\text{pns}} + \mathcal{L}_{\text{pp}}$. A reconstruction branch adds $\mathcal{L}_{\text{train}} = \mathcal{L}_{\text{FR}} + \lambda_{\text{reco}}\mathcal{L}_{\text{reco}} + \lambda_{\text{canon}}\mathcal{L}_{\text{FR}}^{\text{canon}} + \lambda_{\text{view}}\mathcal{L}_{\text{view}}$ to jointly optimize FR and 3D-aware reconstruction. Empirical results on CPLFW, IJB-B/C, and related high-quality datasets show improvements over strong baselines, validating the approach and highlighting reconstruction as a productive auxiliary signal, while acknowledging limitations on very low-quality data and suggesting diffusion/GAN-based reconstruction as future work.

Abstract

In current practical face authentication systems, most face recognition (FR) algorithms are based on cosine similarity with softmax classification. Despite its reliable classification performance, this method struggles with hard samples. A popular strategy to improve FR performance is incorporating angular or cosine margins. However, it does not take face quality or recognition hardness into account, simply increasing the margin value and thus causing an overly uniform training strategy. To address this problem, a novel loss function is proposed, named Loss function for Hard High-quality Face (LH2Face). Firstly, a similarity measure based on the von Mises-Fisher (vMF) distribution is stated, specifically focusing on the logarithm of the Probability Density Function (PDF), which represents the distance between a probability distribution and a vector. Then, an adaptive margin-based multi-classification method using softmax, called the Uncertainty-Aware Margin Function, is implemented in the article. Furthermore, proxy-based loss functions are used to apply extra constraints between the proxy and sample to optimize their representation space distribution. Finally, a renderer is constructed that optimizes FR through face reconstruction and vice versa. Our LH2Face is superior to similiar schemes on hard high-quality face datasets, achieving 49.39% accuracy on the IJB-B dataset, which surpasses the second-place method by 2.37%.

LH2Face: Loss function for Hard High-quality Face

TL;DR

LH2Face addresses the challenge of recognizing hard high-quality faces by introducing a vMF-based similarity with an adaptive margin (Uncertainty-Aware Margin Function), complemented by proxy-based losses to shape the proxy–sample space and a face-reconstruction renderer to inject 3D cues into FR training. The method explicitly ties margin and concentration to sample quality via and , and optimizes the overall loss , where . A reconstruction branch adds to jointly optimize FR and 3D-aware reconstruction. Empirical results on CPLFW, IJB-B/C, and related high-quality datasets show improvements over strong baselines, validating the approach and highlighting reconstruction as a productive auxiliary signal, while acknowledging limitations on very low-quality data and suggesting diffusion/GAN-based reconstruction as future work.

Abstract

In current practical face authentication systems, most face recognition (FR) algorithms are based on cosine similarity with softmax classification. Despite its reliable classification performance, this method struggles with hard samples. A popular strategy to improve FR performance is incorporating angular or cosine margins. However, it does not take face quality or recognition hardness into account, simply increasing the margin value and thus causing an overly uniform training strategy. To address this problem, a novel loss function is proposed, named Loss function for Hard High-quality Face (LH2Face). Firstly, a similarity measure based on the von Mises-Fisher (vMF) distribution is stated, specifically focusing on the logarithm of the Probability Density Function (PDF), which represents the distance between a probability distribution and a vector. Then, an adaptive margin-based multi-classification method using softmax, called the Uncertainty-Aware Margin Function, is implemented in the article. Furthermore, proxy-based loss functions are used to apply extra constraints between the proxy and sample to optimize their representation space distribution. Finally, a renderer is constructed that optimizes FR through face reconstruction and vice versa. Our LH2Face is superior to similiar schemes on hard high-quality face datasets, achieving 49.39% accuracy on the IJB-B dataset, which surpasses the second-place method by 2.37%.

Paper Structure

This paper contains 82 sections, 98 equations, 13 figures, 6 tables, 2 algorithms.

Figures (13)

  • Figure 1: The first line in the image shows high-quality samples, while the second line shows low-quality samples. The numbers below each image represent the feature norm (the L2 norm of the feature vector) and the angle between the image’s feature vector and the corresponding proxy feature vector. For example, $21.45-40.12^{\circ}$ means the feature norm is $21.45$, and the angle is $40.12^{\circ}$. It can be observed that a relatively simple classification method has been used here, with a feature norm of $20$ as the boundary between high-quality and low-quality samples. The value of this boundary is not strictly accurate, or rather, it cannot be perfectly quantified. The number $20$ is used here simply as a boundary for the sake of convenience in understanding. The larger the angle between the image's feature vector and the corresponding proxy feature vector, the harder the sample is.
  • Figure 2: This is a visualization of the values of $\bm{\mu}^\text{T} \bm{x}$ on the unit sphere, where $\bm{\xi}$ is a vector on the sphere that satisfies $\cos\theta = \bm{\mu}^\text{T} \bm{x}$. It belongs to a ring of dimension $n - 2$.
  • Figure 3: This is the plot of the vMF distribution for $n = 2$, showing the relationship between the PDF and $\bm{\mu}^\text{T} \bm{x} \in \left[-1, 1\right]$.
  • Figure 4: The image shows the similarity distribution between the proxies of ArcFace before and after adding $\mathcal{L}_\text{pps}$, along with their corresponding positive samples.
  • Figure 5: The image shows a portion of the pictures included in the entire test set, revealing the style differences in the dataset images.
  • ...and 8 more figures