Table of Contents
Fetching ...

HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere

Hatef Otroshi Shahreza, Sébastien Marcel

TL;DR

Privacy concerns around web-crawled face datasets are addressed by generating synthetic data for training. HyperFace formulates dataset generation as packing reference embeddings on the identity embedding hypersphere and solves via gradient-based optimization with a manifold-regularization term, followed by diffusion-based image synthesis. Empirical results show state-of-the-art or competitive performance on multiple real benchmarks when models are trained with HyperFace data, and the approach scales with more identities and images. The work highlights ethical considerations and potential privacy risks, offering a practical pathway to privacy-preserving, scalable synthetic face data while acknowledging remaining challenges.

Abstract

Face recognition datasets are often collected by crawling Internet and without individuals' consents, raising ethical and privacy concerns. Generating synthetic datasets for training face recognition models has emerged as a promising alternative. However, the generation of synthetic datasets remains challenging as it entails adequate inter-class and intra-class variations. While advances in generative models have made it easier to increase intra-class variations in face datasets (such as pose, illumination, etc.), generating sufficient inter-class variation is still a difficult task. In this paper, we formulate the dataset generation as a packing problem on the embedding space (represented on a hypersphere) of a face recognition model and propose a new synthetic dataset generation approach, called HyperFace. We formalize our packing problem as an optimization problem and solve it with a gradient descent-based approach. Then, we use a conditional face generator model to synthesize face images from the optimized embeddings. We use our generated datasets to train face recognition models and evaluate the trained models on several benchmarking real datasets. Our experimental results show that models trained with HyperFace achieve state-of-the-art performance in training face recognition using synthetic datasets.

HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere

TL;DR

Privacy concerns around web-crawled face datasets are addressed by generating synthetic data for training. HyperFace formulates dataset generation as packing reference embeddings on the identity embedding hypersphere and solves via gradient-based optimization with a manifold-regularization term, followed by diffusion-based image synthesis. Empirical results show state-of-the-art or competitive performance on multiple real benchmarks when models are trained with HyperFace data, and the approach scales with more identities and images. The work highlights ethical considerations and potential privacy risks, offering a practical pathway to privacy-preserving, scalable synthetic face data while acknowledging remaining challenges.

Abstract

Face recognition datasets are often collected by crawling Internet and without individuals' consents, raising ethical and privacy concerns. Generating synthetic datasets for training face recognition models has emerged as a promising alternative. However, the generation of synthetic datasets remains challenging as it entails adequate inter-class and intra-class variations. While advances in generative models have made it easier to increase intra-class variations in face datasets (such as pose, illumination, etc.), generating sufficient inter-class variation is still a difficult task. In this paper, we formulate the dataset generation as a packing problem on the embedding space (represented on a hypersphere) of a face recognition model and propose a new synthetic dataset generation approach, called HyperFace. We formalize our packing problem as an optimization problem and solve it with a gradient descent-based approach. Then, we use a conditional face generator model to synthesize face images from the optimized embeddings. We use our generated datasets to train face recognition models and evaluate the trained models on several benchmarking real datasets. Our experimental results show that models trained with HyperFace achieve state-of-the-art performance in training face recognition using synthetic datasets.

Paper Structure

This paper contains 31 sections, 2 theorems, 12 equations, 6 figures, 14 tables, 3 algorithms.

Key Result

Theorem 1

Let $\bm{X}_\text{ref}=\{\bm{x}_{\text{ref},i}\}_{i=1}^{n_\text{id}}$ represent $n_\text{id}$ points on a $n_\mathcal{X}$-dimensional hypersphere $\mathcal{S}$. Consider an objective function: where $\ell(\cdot,\cdot)$ denotes a pairwise function. The goal is to minimize $\mathcal{L}(\bm{X}_\text{ref})$ for $\bm{X}_\text{ref}=\{\bm{x}_{\text{ref},i}\}_{i=1}^{n_\text{id}}$. Suppose in each iterati

Figures (6)

  • Figure 1: Sample face images from the HyperFace dataset
  • Figure 2: Block diagram of HyperFace Dataset Generation: We start from randomly synthesized face images and extract their embeddings using a pretrained face recognition model $F$. The extracted embeddings are normalised and used as initial points $\{\bm{x}_{\text{ref},i}\}_{i=1}^{n_\text{id}}$ in our HyperFace optmization. The HyperFace optimization tries to increase the intra-class variation for synthetic identities on the manifold of the face recognition model over the hypersphere using a regularization term. The resulting points are then used by a face generator model $G$, which can generate synthetic face images from the embeddings.
  • Figure 3: Sample pairs of images with the highest similarity between face embeddings of images in synthesized dataset and training dataset of StyleGAN, which was used to generate random images for initialization and regularization in the HyperFace optimization.
  • Figure 4: Sample face images of different synthetic identities from the HyperFace dataset.
  • Figure 5: Sample face images of one subject from the HyperFace dataset (intra-class variations).
  • ...and 1 more figures

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • Corollary 1