Table of Contents
Fetching ...

Evaluating Biases in Context-Dependent Health Questions

Sharon Levy, Tahilin Sanchez Karver, William D. Adler, Michelle R. Kaufman, Mark Dredze

TL;DR

This paper investigates biases that arise when chat-based LLMs answer health questions that depend on user context, such as age, sex, and location. It builds a dataset of contextual sexual and reproductive health questions from Planned Parenthood and Go Ask Alice and evaluates two chat models by comparing responses with and without demographic context using embedding-based similarity and human judgments. The results reveal consistent biases toward younger (18–30) and female users and show less pronounced but present location-related effects, with statistical significance for age and location. The work highlights fairness concerns in health Q&A and suggests directions for making contextual health information more equitable, including expanding demographics and keeping external knowledge up to date.

Abstract

Chat-based large language models have the opportunity to empower individuals lacking high-quality healthcare access to receive personalized information across a variety of topics. However, users may ask underspecified questions that require additional context for a model to correctly answer. We study how large language model biases are exhibited through these contextual questions in the healthcare domain. To accomplish this, we curate a dataset of sexual and reproductive healthcare questions that are dependent on age, sex, and location attributes. We compare models' outputs with and without demographic context to determine group alignment among our contextual questions. Our experiments reveal biases in each of these attributes, where young adult female users are favored.

Evaluating Biases in Context-Dependent Health Questions

TL;DR

This paper investigates biases that arise when chat-based LLMs answer health questions that depend on user context, such as age, sex, and location. It builds a dataset of contextual sexual and reproductive health questions from Planned Parenthood and Go Ask Alice and evaluates two chat models by comparing responses with and without demographic context using embedding-based similarity and human judgments. The results reveal consistent biases toward younger (18–30) and female users and show less pronounced but present location-related effects, with statistical significance for age and location. The work highlights fairness concerns in health Q&A and suggests directions for making contextual health information more equitable, including expanding demographics and keeping external knowledge up to date.

Abstract

Chat-based large language models have the opportunity to empower individuals lacking high-quality healthcare access to receive personalized information across a variety of topics. However, users may ask underspecified questions that require additional context for a model to correctly answer. We study how large language model biases are exhibited through these contextual questions in the healthcare domain. To accomplish this, we curate a dataset of sexual and reproductive healthcare questions that are dependent on age, sex, and location attributes. We compare models' outputs with and without demographic context to determine group alignment among our contextual questions. Our experiments reveal biases in each of these attributes, where young adult female users are favored.
Paper Structure (8 sections, 7 figures, 2 tables)

This paper contains 8 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: A model's answer is biased toward the female demographic when asked the question without context.
  • Figure 2: Sex-based annotations instructions for human evaluation.
  • Figure 3: Example of a sex-based question from the survey for human annotations.
  • Figure 4: Age-based annotations instructions for human evaluation.
  • Figure 5: Example of an age-based question from the Prolific survey for human annotations.
  • ...and 2 more figures