Table of Contents
Fetching ...

ONOT: a High-Quality ICAO-compliant Synthetic Mugshot Dataset

Nicolò Di Domenico, Guido Borghi, Annalisa Franco, Davide Maltoni

TL;DR

The paper tackles privacy and bias in facial datasets by introducing ONOT, a synthetic mugshot collection designed to be ISO/ICAO compliant for eMRTD applications. It presents a scalable generation pipeline based on a fine-tuned diffusion model, producing 960k images across 15k pseudo-classes, with stringent ISO/ICAO constraints and subsequent intra- and inter-class identity consistency checks, plus Print&Scan simulation. The results show that only a subset of generated identities survive ISO and consistency tests (4032 identities after ISO; 55–255 after identity consistency depending on thresholds), revealing inherent challenges and bias patterns in synthetic, standards-aligned face data. The work provides reproducible prompts and a release strategy to enable further research in Morphing Attack Detection, Face Quality Assessment, and related document-analysis tasks, contributing a standards-aligned resource for privacy-preserving evaluation and benchmarking.

Abstract

Nowadays, state-of-the-art AI-based generative models represent a viable solution to overcome privacy issues and biases in the collection of datasets containing personal information, such as faces. Following this intuition, in this paper we introduce ONOT, a synthetic dataset specifically focused on the generation of high-quality faces in adherence to the requirements of the ISO/IEC 39794-5 standards that, following the guidelines of the International Civil Aviation Organization (ICAO), defines the interchange formats of face images in electronic Machine-Readable Travel Documents (eMRTD). The strictly controlled and varied mugshot images included in ONOT are useful in research fields related to the analysis of face images in eMRTD, such as Morphing Attack Detection and Face Quality Assessment. The dataset is publicly released, in combination with the generation procedure details in order to improve the reproducibility and enable future extensions.

ONOT: a High-Quality ICAO-compliant Synthetic Mugshot Dataset

TL;DR

The paper tackles privacy and bias in facial datasets by introducing ONOT, a synthetic mugshot collection designed to be ISO/ICAO compliant for eMRTD applications. It presents a scalable generation pipeline based on a fine-tuned diffusion model, producing 960k images across 15k pseudo-classes, with stringent ISO/ICAO constraints and subsequent intra- and inter-class identity consistency checks, plus Print&Scan simulation. The results show that only a subset of generated identities survive ISO and consistency tests (4032 identities after ISO; 55–255 after identity consistency depending on thresholds), revealing inherent challenges and bias patterns in synthetic, standards-aligned face data. The work provides reproducible prompts and a release strategy to enable further research in Morphing Attack Detection, Face Quality Assessment, and related document-analysis tasks, contributing a standards-aligned resource for privacy-preserving evaluation and benchmarking.

Abstract

Nowadays, state-of-the-art AI-based generative models represent a viable solution to overcome privacy issues and biases in the collection of datasets containing personal information, such as faces. Following this intuition, in this paper we introduce ONOT, a synthetic dataset specifically focused on the generation of high-quality faces in adherence to the requirements of the ISO/IEC 39794-5 standards that, following the guidelines of the International Civil Aviation Organization (ICAO), defines the interchange formats of face images in electronic Machine-Readable Travel Documents (eMRTD). The strictly controlled and varied mugshot images included in ONOT are useful in research fields related to the analysis of face images in eMRTD, such as Morphing Attack Detection and Face Quality Assessment. The dataset is publicly released, in combination with the generation procedure details in order to improve the reproducibility and enable future extensions.
Paper Structure (11 sections, 4 equations, 8 figures, 3 tables)

This paper contains 11 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Samples of the ONOT dataset compliant with the ISO/IEC 39794-5 standard and ICAO guidelines. The dataset exhibits a great inter-class variety, in terms of, among others, genders, ethnicity, age and face-specific traits.
  • Figure 2: Steps for the generation of the ONOT dataset. Starting from the initial image generation procedure, we apply a commercial SDK to verify if the generated images are compliant with the ISO/ICAO standard. The following steps regard the verification of the intra-class consistency, i.e. all images of the same subject share the same identity and the inter-class consistency, i.e. each subject presents a unique identity with respect to all the other generated subjects.
  • Figure 3: In addition to ISO/ICAO-compliant samples, other images are generated for each identity. As shown, intra-class variance is present in terms of different head and body poses and facial traits.
  • Figure 4: Samples of the subject variability included in the ONOT dataset. Different genders, ethnicities, ages and facial traits are included in the dataset, enhancing the variability of the dataset. The naming convention is reported in Table \ref{['tab:file-naming']}.
  • Figure 5: Visual samples of the application of the P&S operation (see Sect. \ref{['sec:pes']}) on two original images.
  • ...and 3 more figures