Table of Contents
Fetching ...

True to Tone? Quantifying Skin Tone Fidelity and Bias in Photographic-to-Virtual Human Pipelines

Gabriel Ferri Schneider, Erick Menezes, Rafael Mecenas, Paulo Knob, Victor Araujo, Soraia Raupp Musse

Abstract

Accurate reproduction of facial skin tone is essential for realism, identity preservation, and fairness in Virtual Human (VH) rendering. However, most accessible avatar creation pipelines rely on photographic inputs that lack colorimetric calibration, which can introduce inconsistencies and bias. We propose a fully automatic and scalable methodology to systematically evaluate skin tone fidelity across the VH generation pipeline. Our approach defines a full workflow that integrates skin color and illumination extraction, texture recolorization, real-time rendering, and quantitative color analysis. Using facial images from the Chicago Face Database (CFD), we compare skin tone extraction strategies based on cheek-region sampling, following the literature, and multidimensional masking derived from full-face analysis. Additionally, we test both strategies with lighting isolation, using the pre-trained TRUST framework, employed without any training or optimization within our pipeline. Extracted skin tones are applied to MetaHuman textures and rendered under multiple lighting configurations. Skin tone consistency is evaluated objectively in the CIELAB color space using the $ΔE$ metric and the Individual Typology Angle (ITA). The proposed methodology operates without manual intervention and, with the exception of pre-trained illumination compensation modules, the pipeline does not include learning or training stages, enabling low computational cost and large-scale evaluation. Using this framework, we generate and analyze approximately 19,848 rendered instances. Our results show phenotype-dependent behavior of extraction strategies and consistently higher colorimetric errors for darker skin tones.

True to Tone? Quantifying Skin Tone Fidelity and Bias in Photographic-to-Virtual Human Pipelines

Abstract

Accurate reproduction of facial skin tone is essential for realism, identity preservation, and fairness in Virtual Human (VH) rendering. However, most accessible avatar creation pipelines rely on photographic inputs that lack colorimetric calibration, which can introduce inconsistencies and bias. We propose a fully automatic and scalable methodology to systematically evaluate skin tone fidelity across the VH generation pipeline. Our approach defines a full workflow that integrates skin color and illumination extraction, texture recolorization, real-time rendering, and quantitative color analysis. Using facial images from the Chicago Face Database (CFD), we compare skin tone extraction strategies based on cheek-region sampling, following the literature, and multidimensional masking derived from full-face analysis. Additionally, we test both strategies with lighting isolation, using the pre-trained TRUST framework, employed without any training or optimization within our pipeline. Extracted skin tones are applied to MetaHuman textures and rendered under multiple lighting configurations. Skin tone consistency is evaluated objectively in the CIELAB color space using the metric and the Individual Typology Angle (ITA). The proposed methodology operates without manual intervention and, with the exception of pre-trained illumination compensation modules, the pipeline does not include learning or training stages, enabling low computational cost and large-scale evaluation. Using this framework, we generate and analyze approximately 19,848 rendered instances. Our results show phenotype-dependent behavior of extraction strategies and consistently higher colorimetric errors for darker skin tones.

Paper Structure

This paper contains 29 sections, 10 figures.

Figures (10)

  • Figure 1: From left to right, the first four columns represent subjects from the CFD dataset classified as ITA Class 1 (lighter skin tones), while the second group of four columns represents ITA Class 6 (darker skin tones). Each column corresponds to one extraction method for each group: $Cheek$, $MMM$, $T-Cheek$, and $T-MMM$. Additionally, each row (from top to bottom) shows the results for a different lighting configuration: CFD, Frontal, and Paramount.
  • Figure 2: Overview of our methodology for evaluating skin tone consistency. Starting with the ground truth images extracted from the CFD Dataset ma2015chicagoma2021chicagolakshmi2021india (1), we extract the average skin color following four different approaches: $Cheek$ (2), $MMM$ (3), $T-Cheek$ (4), and $T-MMM$ (5). Considering our MetaHuman Setup (6), we apply a normalization (7) and a variation map (8) in the texture, before rendering the final image with three different lighting configurations: Frontal Light (9), Paramount Light (10), and CFD Light (11). Finally, we validate the resulting rendered images with two metrics: $\Delta E$ (12) and ITA Error (13).
  • Figure 3: Values of $\Delta E$ for 3 light conditions and 4 extraction methods. Values of ITA Error considering 6 ITA Classes and 4 extraction methods. The higher the ITA Class number, the darker the skin tone.
  • Figure 4: Values of ITA Error for 6 ITA Classes and 4 extraction methods. Values of ITA Error considering 6 ITA Classes and 4 extraction methods. The higher the ITA Class number, the darker the skin tone.
  • Figure 5: Values of ITA Error considering 6 ITA Classes and 3 light conditions. The higher the ITA Class number, the darker the skin tone.
  • ...and 5 more figures