Dr. Bias: Social Disparities in AI-Powered Medical Guidance
Emma Kondrup, Anne Imouza
TL;DR
Dr. Bias investigates whether AI-powered medical guidance produced by LLMs encodes social disparities across demographic groups. The authors implement a two-step generation pipeline using Llama-3-8B-Instruct to produce 42,000 medical-advice messages across 84 demographic profiles and five medical categories, then quantify readability, length, and sentiment to detect group differences. They find that Indigenous and intersex identities receive more complex, harder-to-read advice, with intersectional identities amplifying these effects, particularly in mental health domains. The work underscores the need for AI literacy and robust mitigation strategies in AI deployment for healthcare to prevent unjust disparities and calls for finer-grained data and stakeholder-inclusive evaluation.
Abstract
With the rapid progress of Large Language Models (LLMs), the general public now has easy and affordable access to applications capable of answering most health-related questions in a personalized manner. These LLMs are increasingly proving to be competitive, and now even surpass professionals in some medical capabilities. They hold particular promise in low-resource settings, considering they provide the possibility of widely accessible, quasi-free healthcare support. However, evaluations that fuel these motivations highly lack insights into the social nature of healthcare, oblivious to health disparities between social groups and to how bias may translate into LLM-generated medical advice and impact users. We provide an exploratory analysis of LLM answers to a series of medical questions spanning key clinical domains, where we simulate these questions being asked by several patient profiles that vary in sex, age range, and ethnicity. By comparing natural language features of the generated responses, we show that, when LLMs are used for medical advice generation, they generate responses that systematically differ between social groups. In particular, Indigenous and intersex patients receive advice that is less readable and more complex. We observe these trends amplify when intersectional groups are considered. Considering the increasing trust individuals place in these models, we argue for higher AI literacy and for the urgent need for investigation and mitigation by AI developers to ensure these systemic differences are diminished and do not translate to unjust patient support. Our code is publicly available on GitHub.
