Evaluating Biases in Context-Dependent Health Questions
Sharon Levy, Tahilin Sanchez Karver, William D. Adler, Michelle R. Kaufman, Mark Dredze
TL;DR
This paper investigates biases that arise when chat-based LLMs answer health questions that depend on user context, such as age, sex, and location. It builds a dataset of contextual sexual and reproductive health questions from Planned Parenthood and Go Ask Alice and evaluates two chat models by comparing responses with and without demographic context using embedding-based similarity and human judgments. The results reveal consistent biases toward younger (18–30) and female users and show less pronounced but present location-related effects, with statistical significance for age and location. The work highlights fairness concerns in health Q&A and suggests directions for making contextual health information more equitable, including expanding demographics and keeping external knowledge up to date.
Abstract
Chat-based large language models have the opportunity to empower individuals lacking high-quality healthcare access to receive personalized information across a variety of topics. However, users may ask underspecified questions that require additional context for a model to correctly answer. We study how large language model biases are exhibited through these contextual questions in the healthcare domain. To accomplish this, we curate a dataset of sexual and reproductive healthcare questions that are dependent on age, sex, and location attributes. We compare models' outputs with and without demographic context to determine group alignment among our contextual questions. Our experiments reveal biases in each of these attributes, where young adult female users are favored.
