Table of Contents
Fetching ...

RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation

Ganlin Feng, Yuxi Long, Hafsa Ali, Erin Lou, Fahad Butt, Qian Liu, Yang Wang, Pingzhao Hu

Abstract

Rare diseases often manifest with distinctive facial phenotypes in children, offering valuable diagnostic cues for clinicians and AI-assisted screening systems. However, progress in this field is severely limited by the scarcity of curated, ethically sourced facial data and the high similarity among phenotypes across different conditions. To address these challenges, we introduce RDFace, a curated benchmark dataset comprising 456 pediatric facial images spanning 103 rare genetic conditions (average 4.4 samples per condition). Each ethically verified image is paired with standardized metadata. RDFace enables the development and evaluation of data-efficient AI models for rare disease diagnosis under real-world low-data constraints. We benchmark multiple pretrained vision backbones using cross-validation and explore synthetic augmentation with DreamBooth and FastGAN. Generated images are filtered via facial landmark similarity to maintain phenotype fidelity and merged with real data, improving diagnostic accuracy by up to 13.7% in ultra-low-data regimes. To assess semantic validity, phenotype descriptions generated by a vision-language model from real and synthetic images achieve a report similarity score of 0.84. RDFace establishes a transparent, benchmark-ready dataset for equitable rare disease AI research and presents a scalable framework for evaluating both diagnostic performance and the integrity of synthetic medical imagery.

RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation

Abstract

Rare diseases often manifest with distinctive facial phenotypes in children, offering valuable diagnostic cues for clinicians and AI-assisted screening systems. However, progress in this field is severely limited by the scarcity of curated, ethically sourced facial data and the high similarity among phenotypes across different conditions. To address these challenges, we introduce RDFace, a curated benchmark dataset comprising 456 pediatric facial images spanning 103 rare genetic conditions (average 4.4 samples per condition). Each ethically verified image is paired with standardized metadata. RDFace enables the development and evaluation of data-efficient AI models for rare disease diagnosis under real-world low-data constraints. We benchmark multiple pretrained vision backbones using cross-validation and explore synthetic augmentation with DreamBooth and FastGAN. Generated images are filtered via facial landmark similarity to maintain phenotype fidelity and merged with real data, improving diagnostic accuracy by up to 13.7% in ultra-low-data regimes. To assess semantic validity, phenotype descriptions generated by a vision-language model from real and synthetic images achieve a report similarity score of 0.84. RDFace establishes a transparent, benchmark-ready dataset for equitable rare disease AI research and presents a scalable framework for evaluating both diagnostic performance and the integrity of synthetic medical imagery.

Paper Structure

This paper contains 58 sections, 3 equations, 22 figures, 16 tables, 1 algorithm.

Figures (22)

  • Figure 1: Pipeline of RDFace Evaluation. The pipeline includes dataset curation, classification tasks, synthetic data generation and augmentation, and benchmark evaluation for RD diagnostic support.
  • Figure 2: Overview of RDFace Dataset. (a) Geographic distribution of cases across 46 countries, with circle size proportional to the number of cases per country. (b) Sex distribution. (c) Age group distribution of patients with an average age of 6.36 years.
  • Figure 3: Pipeline of synthetic data generation and evaluation. Real pediatric facial images are first preprocessed using Real-ESRGAN and DDColor, then used to generate synthetic faces via DreamBooth (class-conditioned) and FastGAN (unconditional). Generated images are evaluated for facial realism (RetinaFace and LPIPS) and phenotype consistency (landmark-based cosine similarity).
  • Figure 4: Representative samples of synthetic images. (a) DreamBooth-generated synthetic images conditioned on AAR. (b) FastGAN-generated synthetic images.
  • Figure 5: Comparison of phenotype descriptions generated by VLM between a real image and one corresponding synthetic image. Left: real image; Right: DreamBooth-generated synthetic image. The real image has been visually processed to reduce identifiability in accordance with privacy and ethical considerations. Green indicates consistent phenotype terms while red indicates conflicting descriptions.
  • ...and 17 more figures