Table of Contents
Fetching ...

Facial Image Feature Analysis and its Specialization for Fréchet Distance and Neighborhoods

Doruk Cetin, Benedikt Schesch, Petar Stamenkovic, Niko Benjamin Huber, Fabio Zünd, Majed El Helou

TL;DR

This work provides the first analysis on domain-specific feature training and its effects on feature distance, on the widely-researched facial image domain, supported by extensive experiments and in-depth user studies.

Abstract

Assessing distances between images and image datasets is a fundamental task in vision-based research. It is a challenging open problem in the literature and despite the criticism it receives, the most ubiquitous method remains the Fréchet Inception Distance. The Inception network is trained on a specific labeled dataset, ImageNet, which has caused the core of its criticism in the most recent research. Improvements were shown by moving to self-supervision learning over ImageNet, leaving the training data domain as an open question. We make that last leap and provide the first analysis on domain-specific feature training and its effects on feature distance, on the widely-researched facial image domain. We provide our findings and insights on this domain specialization for Fréchet distance and image neighborhoods, supported by extensive experiments and in-depth user studies.

Facial Image Feature Analysis and its Specialization for Fréchet Distance and Neighborhoods

TL;DR

This work provides the first analysis on domain-specific feature training and its effects on feature distance, on the widely-researched facial image domain, supported by extensive experiments and in-depth user studies.

Abstract

Assessing distances between images and image datasets is a fundamental task in vision-based research. It is a challenging open problem in the literature and despite the criticism it receives, the most ubiquitous method remains the Fréchet Inception Distance. The Inception network is trained on a specific labeled dataset, ImageNet, which has caused the core of its criticism in the most recent research. Improvements were shown by moving to self-supervision learning over ImageNet, leaving the training data domain as an open question. We make that last leap and provide the first analysis on domain-specific feature training and its effects on feature distance, on the widely-researched facial image domain. We provide our findings and insights on this domain specialization for Fréchet distance and image neighborhoods, supported by extensive experiments and in-depth user studies.

Paper Structure

This paper contains 11 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Rescaled Fréchet distances computed on Inception features (trained on ImageNet), SwAV features (trained on ImageNet), and DINO features (trained on Faces). Each distance is between the x-axis sets and CelebA-HQ data (5k samples). For better readability, we rescale all values with a ratio fixed per method and determined on an independent dataset.
  • Figure 2: Fréchet distances computed on ImageNet-trained Inception features and on Faces-trained DINO features, between synthetically generated images and CelebA-HQ images.
  • Figure 3: Samples from our user study on feature space neighborhoods. For each reference image, we show its nearest neighbors in Inception and DINO feature spaces. Inception is biased towards objects (hat and microphone), while DINO can be perturbed by occluding objects (bottom). Therefore, Inception neighbors are not similar to the person, but simply wear similar hats.