Surgeons Are Indian Males and Speech Therapists Are White Females: Auditing Biases in Vision-Language Models for Healthcare Professionals
Zohaib Hasan Siddiqui, Dayam Nadeem, Mohammad Masudur Rahman, Mohammad Nadeem, Shahab Saquib Sohail, Beenish Moalla Chaudhry
TL;DR
This study audits demographic biases in vision-language models for healthcare professions using a profession-aware taxonomy, a balanced FairFace dataset, and four off-the-shelf VLMs (CLIP and OpenCLIP). It employs neutral prompts and top-$k$ retrieval alongside JS-divergence-based bias scores to quantify gender, race, and age associations across 33 healthcare roles. The findings show systematic but model-dependent biases, with age bias being especially dominant and notable intersectional patterns that could impact hiring and workforce analytics. The work highlights the need for model-specific fairness audits, intersectional benchmarks, and governance to ensure equitable and trustworthy AI deployment in healthcare.
Abstract
Vision language models (VLMs), such as CLIP and OpenCLIP, can encode and reflect stereotypical associations between medical professions and demographic attributes learned from web-scale data. We present an evaluation protocol for healthcare settings that quantifies associated biases and assesses their operational risk. Our methodology (i) defines a taxonomy spanning clinicians and allied healthcare roles (e.g., surgeon, cardiologist, dentist, nurse, pharmacist, technician), (ii) curates a profession-aware prompt suite to probe model behavior, and (iii) benchmarks demographic skew against a balanced face corpus. Empirically, we observe consistent demographic biases across multiple roles and vision models. Our work highlights the importance of bias identification in critical domains such as healthcare as AI-enabled hiring and workforce analytics can have downstream implications for equity, compliance, and patient trust.
