Can Text-to-Image Generative Models Accurately Depict Age? A Comparative Study on Synthetic Portrait Generation and Age Estimation
Alexey A. Novikov, Miroslav Vranka, François David, Artem Voronin
TL;DR
This study systematically evaluates whether text-to-image generative models can accurately depict age in synthetic portraits across $212$ nationalities, $30$ ages, and balanced genders, using two age estimators as ground truth. It compares Flux, SD3.5, and Epic Realism, finding that while identity cues are generally preserved, precise age depiction is inconsistent and biased, with Epic Realism showing the strongest age alignment but notable errors at older ages. The work highlights significant biases and outliers in synthetic data, cautions against using such images for high-stakes age tasks without filtering, and discusses mitigations through targeted training, prompt engineering, and post-hoc correction. It also addresses ethical, privacy, and security considerations for deploying synthetic age data in practical biometric workflows.
Abstract
Text-to-image generative models have shown remarkable progress in producing diverse and photorealistic outputs. In this paper, we present a comprehensive analysis of their effectiveness in creating synthetic portraits that accurately represent various demographic attributes, with a special focus on age, nationality, and gender. Our evaluation employs prompts specifying detailed profiles (e.g., Photorealistic selfie photo of a 32-year-old Canadian male), covering a broad spectrum of 212 nationalities, 30 distinct ages from 10 to 78, and balanced gender representation. We compare the generated images against ground truth age estimates from two established age estimation models to assess how faithfully age is depicted. Our findings reveal that although text-to-image models can consistently generate faces reflecting different identities, the accuracy with which they capture specific ages and do so across diverse demographic backgrounds remains highly variable. These results suggest that current synthetic data may be insufficiently reliable for high-stakes age-related tasks requiring robust precision, unless practitioners are prepared to invest in significant filtering and curation. Nevertheless, they may still be useful in less sensitive or exploratory applications, where absolute age precision is not critical.
