Table of Contents
Fetching ...

SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes

Georgia Baltsou, Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos

TL;DR

This work addresses bias and representational gaps in facial image datasets by proposing a systematic pipeline to generate diverse synthetic face images that extend beyond traditional demographics to include non-permanent traits such as hairstyle and accessories. Using a diffusion-based approach (Stable Diffusion v2.1) and carefully crafted prompts, the authors create SDFD, a 1000-image dataset designed as an evaluation set for demographic attribute prediction. Comparative analyses show SDFD is equally or more challenging for attribute classification than established datasets like FairFace and LFW, while being significantly smaller and more controllable. The study highlights both the potential and limitations of synthetic data for fairness evaluation and outlines future directions to broaden attribute coverage and model variety to further support robust, inclusive AI systems.

Abstract

AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow focus limits the diversity of the data and consequently the robustness of AI systems trained on them. This work aims to address this limitation by proposing a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity. Specifically, our approach integrates a systematic prompt formulation strategy, encompassing not only demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories. These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images and can be used as an evaluation set in face analysis systems. Compared to existing datasets, our proposed dataset proves equally or more challenging in image classification tasks while being much smaller in size.

SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes

TL;DR

This work addresses bias and representational gaps in facial image datasets by proposing a systematic pipeline to generate diverse synthetic face images that extend beyond traditional demographics to include non-permanent traits such as hairstyle and accessories. Using a diffusion-based approach (Stable Diffusion v2.1) and carefully crafted prompts, the authors create SDFD, a 1000-image dataset designed as an evaluation set for demographic attribute prediction. Comparative analyses show SDFD is equally or more challenging for attribute classification than established datasets like FairFace and LFW, while being significantly smaller and more controllable. The study highlights both the potential and limitations of synthetic data for fairness evaluation and outlines future directions to broaden attribute coverage and model variety to further support robust, inclusive AI systems.

Abstract

AI systems rely on extensive training on large datasets to address various tasks. However, image-based systems, particularly those used for demographic attribute prediction, face significant challenges. Many current face image datasets primarily focus on demographic factors such as age, gender, and skin tone, overlooking other crucial facial attributes like hairstyle and accessories. This narrow focus limits the diversity of the data and consequently the robustness of AI systems trained on them. This work aims to address this limitation by proposing a methodology for generating synthetic face image datasets that capture a broader spectrum of facial diversity. Specifically, our approach integrates a systematic prompt formulation strategy, encompassing not only demographics and biometrics but also non-permanent traits like make-up, hairstyle, and accessories. These prompts guide a state-of-the-art text-to-image model in generating a comprehensive dataset of high-quality realistic images and can be used as an evaluation set in face analysis systems. Compared to existing datasets, our proposed dataset proves equally or more challenging in image classification tasks while being much smaller in size.
Paper Structure (13 sections, 10 figures, 8 tables, 1 algorithm)

This paper contains 13 sections, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of proposed image generation process.
  • Figure 2: Examples of generated images with other stable diffusion versions and their corresponding prompts without including universal prompt terms: \ref{['fig:oth-a']} Stable Diffusion $1.5$: White, tired, man, wearing glasses, front face, Ultra HD, Nikon Z9, \ref{['fig:oth-b']} Stable Diffusion xl Turbo: Black, angry, red hair, androgynous person, wearing glasses, side profile, Fujifilm XT3.
  • Figure 3: Example generated images with different values of inference steps for the prompt (apart from universal) "White, blonde woman, blue eyes, wearing hat": \ref{['fig:inf-a']} 5 steps, \ref{['fig:inf-b']} 10 steps, \ref{['fig:inf-c']} 15 steps, \ref{['fig:inf-d']} 30 steps, \ref{['fig:inf-e']} 50 steps and \ref{['fig:inf-f']} 70 steps.
  • Figure 4: Example generated images with different values of CFG weight for the prompt (apart from universal) "Asian, woman, black hair, smiling": \ref{['fig:cfg-a']} weight = 2.5, \ref{['fig:cfg-b']} weight = 5, \ref{['fig:cfg-c']} weight = 7.5, \ref{['fig:cfg-d']} weight = 10, \ref{['fig:cfg-e']} weight = 12.5 and \ref{['fig:cfg-f']} weight = 15.
  • Figure 5: Example generated images and their corresponding prompts without including universal prompt terms: \ref{['fig:ex-a']} Pacific Islander, stressed, wearing headscarf, girl, black eyes, wearing lipstick, 8K, \ref{['fig:ex-b']} East Asian, angry, red hair, man, wearing colour contact lenses, front face, Fujifilm XT3 and, \ref{['fig:ex-c']} White, androgynous person, pink hair, blue eyes, Fujifilm XT3.
  • ...and 5 more figures