Table of Contents
Fetching ...

Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks

Rami Haffar, David Sánchez, Josep Domingo-Ferrer

TL;DR

The paper introduces two Multi-Task Faces (MTF) datasets—one non-curated and one curated—composed of real celebrity face images licensed for public use. It addresses GDPR privacy concerns by sourcing images from public figures with permissive licenses and obtaining ethical approvals. Evaluations across four tasks (face recognition, race, gender, and age) using five DL models show high performance on the curated dataset, with ConvNeXT achieving up to 98.88% accuracy in gender classification, 95.77% in race, 97.60% in age, and 79.87% in face recognition. The results demonstrate the value of careful data curation for accuracy and fairness and provide a publicly accessible, legally compliant benchmark for multi-task facial analysis.

Abstract

Human facial data offers valuable potential for tackling classification problems, including face recognition, age estimation, gender identification, emotion analysis, and race classification. However, recent privacy regulations, particularly the EU General Data Protection Regulation, have restricted the collection and usage of human images in research. As a result, several previously published face data sets have been removed from the internet due to inadequate data collection methods and privacy concerns. While synthetic data sets have been suggested as an alternative, they fall short of accurately representing the real data distribution. Additionally, most existing data sets are labeled for just a single task, which limits their versatility. To address these limitations, we introduce the Multi-Task Face (MTF) data set, designed for various tasks, including face recognition and classification by race, gender, and age, as well as for aiding in training generative networks. The MTF data set comes in two versions: a non-curated set containing 132,816 images of 640 individuals and a manually curated set with 5,246 images of 240 individuals, meticulously selected to maximize their classification quality. Both data sets were ethically sourced, using publicly available celebrity images in full compliance with copyright regulations. Along with providing detailed descriptions of data collection and processing, we evaluated the effectiveness of the MTF data set in training five deep learning models across the aforementioned classification tasks, achieving up to 98.88\% accuracy for gender classification, 95.77\% for race classification, 97.60\% for age classification, and 79.87\% for face recognition with the ConvNeXT model. Both MTF data sets can be accessed through the following link. https://github.com/RamiHaf/MTF_data_set

Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks

TL;DR

The paper introduces two Multi-Task Faces (MTF) datasets—one non-curated and one curated—composed of real celebrity face images licensed for public use. It addresses GDPR privacy concerns by sourcing images from public figures with permissive licenses and obtaining ethical approvals. Evaluations across four tasks (face recognition, race, gender, and age) using five DL models show high performance on the curated dataset, with ConvNeXT achieving up to 98.88% accuracy in gender classification, 95.77% in race, 97.60% in age, and 79.87% in face recognition. The results demonstrate the value of careful data curation for accuracy and fairness and provide a publicly accessible, legally compliant benchmark for multi-task facial analysis.

Abstract

Human facial data offers valuable potential for tackling classification problems, including face recognition, age estimation, gender identification, emotion analysis, and race classification. However, recent privacy regulations, particularly the EU General Data Protection Regulation, have restricted the collection and usage of human images in research. As a result, several previously published face data sets have been removed from the internet due to inadequate data collection methods and privacy concerns. While synthetic data sets have been suggested as an alternative, they fall short of accurately representing the real data distribution. Additionally, most existing data sets are labeled for just a single task, which limits their versatility. To address these limitations, we introduce the Multi-Task Face (MTF) data set, designed for various tasks, including face recognition and classification by race, gender, and age, as well as for aiding in training generative networks. The MTF data set comes in two versions: a non-curated set containing 132,816 images of 640 individuals and a manually curated set with 5,246 images of 240 individuals, meticulously selected to maximize their classification quality. Both data sets were ethically sourced, using publicly available celebrity images in full compliance with copyright regulations. Along with providing detailed descriptions of data collection and processing, we evaluated the effectiveness of the MTF data set in training five deep learning models across the aforementioned classification tasks, achieving up to 98.88\% accuracy for gender classification, 95.77\% for race classification, 97.60\% for age classification, and 79.87\% for face recognition with the ConvNeXT model. Both MTF data sets can be accessed through the following link. https://github.com/RamiHaf/MTF_data_set
Paper Structure (21 sections, 2 figures, 10 tables)

This paper contains 21 sections, 2 figures, 10 tables.

Figures (2)

  • Figure 1: Examples of collected, processed, and labeled images from the MTF data sets
  • Figure 2: Organization of the folders in the released data set