Table of Contents
Fetching ...

TrueSkin: Towards Fair and Accurate Skin Tone Recognition and Generation

Haoming Lu

TL;DR

This work introduces TrueSkin, a dataset with 7299 images systematically categorized into 6 classes, collected under diverse lighting conditions, camera angles, and capture settings, and demonstrates that training a recognition model on TrueSkin improves classification accuracy by more than $\mathbf{2 0 \%}$ compared to LMMs and conventional approaches, and finetuning with TrueSkin significantly improves skin tone fidelity in image generation models.

Abstract

Skin tone recognition and generation play important roles in model fairness, healthcare, and generative AI, yet they remain challenging due to the lack of comprehensive datasets and robust methodologies. Compared to other human image analysis tasks, state-of-the-art large multimodal models (LMMs) and image generation models struggle to recognize and synthesize skin tones accurately. To address this, we introduce TrueSkin, a dataset with 7299 images systematically categorized into 6 classes, collected under diverse lighting conditions, camera angles, and capture settings. Using TrueSkin, we benchmark existing recognition and generation approaches, revealing substantial biases: LMMs tend to misclassify intermediate skin tones as lighter ones, whereas generative models struggle to accurately produce specified skin tones when influenced by inherent biases from unrelated attributes in the prompts, such as hairstyle or environmental context. We further demonstrate that training a recognition model on TrueSkin improves classification accuracy by more than 20\% compared to LMMs and conventional approaches, and fine-tuning with TrueSkin significantly improves skin tone fidelity in image generation models. Our findings highlight the need for comprehensive datasets like TrueSkin, which not only serves as a benchmark for evaluating existing models but also provides a valuable training resource to enhance fairness and accuracy in skin tone recognition and generation tasks.

TrueSkin: Towards Fair and Accurate Skin Tone Recognition and Generation

TL;DR

This work introduces TrueSkin, a dataset with 7299 images systematically categorized into 6 classes, collected under diverse lighting conditions, camera angles, and capture settings, and demonstrates that training a recognition model on TrueSkin improves classification accuracy by more than compared to LMMs and conventional approaches, and finetuning with TrueSkin significantly improves skin tone fidelity in image generation models.

Abstract

Skin tone recognition and generation play important roles in model fairness, healthcare, and generative AI, yet they remain challenging due to the lack of comprehensive datasets and robust methodologies. Compared to other human image analysis tasks, state-of-the-art large multimodal models (LMMs) and image generation models struggle to recognize and synthesize skin tones accurately. To address this, we introduce TrueSkin, a dataset with 7299 images systematically categorized into 6 classes, collected under diverse lighting conditions, camera angles, and capture settings. Using TrueSkin, we benchmark existing recognition and generation approaches, revealing substantial biases: LMMs tend to misclassify intermediate skin tones as lighter ones, whereas generative models struggle to accurately produce specified skin tones when influenced by inherent biases from unrelated attributes in the prompts, such as hairstyle or environmental context. We further demonstrate that training a recognition model on TrueSkin improves classification accuracy by more than 20\% compared to LMMs and conventional approaches, and fine-tuning with TrueSkin significantly improves skin tone fidelity in image generation models. Our findings highlight the need for comprehensive datasets like TrueSkin, which not only serves as a benchmark for evaluating existing models but also provides a valuable training resource to enhance fairness and accuracy in skin tone recognition and generation tasks.

Paper Structure

This paper contains 17 sections, 5 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Samples from TrueSkin dataset, showing the discrepancies between apparent and true skin tone. The labels of each image presented in left-to-right and top-to-bottom order are: medium, pale, dark, light, brown, tan, medium, light, and pale.
  • Figure 2: Samples from existing skin tone datasets: Fitzpatrick17k (left) and SCIN (right). These datasets predominantly contain close-up images of isolated body parts, classified based on medical criteria (e.g., skin’s tendency to tan or burn), resulting in limited diversity in appearance and context.
  • Figure 3: Skin tone label distribution of existing datasets and TrueSkin.
  • Figure 4: Count of real images versus generated images.
  • Figure 5: Distribution of the proportion of skin area relative to total image area in the dataset. $<0.1$ means less than 10% of the pixels are skin areas.
  • ...and 7 more figures