Mitigating the Risk of Health Inequity Exacerbated by Large Language Models

Yuelyu Ji; Wenhe Ma; Sonish Sivarajkumar; Hang Zhang; Eugene Mathew Sadhu; Zhuochun Li; Xizhi Wu; Shyam Visweswaran; Yanshan Wang

Mitigating the Risk of Health Inequity Exacerbated by Large Language Models

Yuelyu Ji, Wenhe Ma, Sonish Sivarajkumar, Hang Zhang, Eugene Mathew Sadhu, Zhuochun Li, Xizhi Wu, Shyam Visweswaran, Yanshan Wang

TL;DR

The paper addresses health inequity risks arising from Large Language Models in clinical trial matching and medical question answering by revealing how sociodemographic inputs can bias outputs. It introduces EquityGuard, a contrastive-learning framework that disentangles social determinants of health from task embeddings to reduce inequities and improve fairness without sacrificing performance. Across CTM and MQA datasets and several LLMs, EquityGuard yields more uniform task outcomes and lowers fairness gaps as measured by EO and DP, with notable improvements in error rates for underrepresented groups. This framework advances equitable AI in clinical settings and highlights practical considerations for deploying LLMs in healthcare while mitigating disparities.

Abstract

Recent advancements in large language models have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question answering for clinical decision support. However, our study shows that incorporating non decisive sociodemographic factors such as race, sex, income level, LGBT+ status, homelessness, illiteracy, disability, and unemployment into the input of LLMs can lead to incorrect and harmful outputs for these populations. These discrepancies risk exacerbating existing health disparities if LLMs are widely adopted in healthcare. To address this issue, we introduce EquityGuard, a novel framework designed to detect and mitigate the risk of health inequities in LLM based medical applications. Our evaluation demonstrates its efficacy in promoting equitable outcomes across diverse populations.

Mitigating the Risk of Health Inequity Exacerbated by Large Language Models

TL;DR

Abstract

Paper Structure (19 sections, 2 equations, 7 figures, 16 tables)

This paper contains 19 sections, 2 equations, 7 figures, 16 tables.

Introduction
Results
Comparison of Equity in LLMs
Fairness and Correlation Analysis
Inequity Mitigation in Clinical Trial Matching
Inequity Mitigation in Medical Question Answering
Enhanced Fairness Metrics
Overall Impact of EquityGuard
Method
Overview
Data Processing
EquityGuard Framework
Model Architecture and Training
Evaluation
Discussion and Limitations
...and 4 more sections

Figures (7)

Figure 1: This figure illustrates inequities that arise when applying LLMs to two healthcare tasks: Clinical Trial Matching (left) and Medical Question Answering (right). On the left, adding race information (e.g., Native American) to the patient note—despite it being irrelevant to the outcome—resulted in altered clinical trial recommendations generated by the LLMs. On the right, including race and sex information (e.g., African American and female) in the question, which should not affect the response, led to incorrect answers from the LLMs.
Figure 2: Performance of LLMs in the clinical trial matching (measured by NDCG@10—the higher, the better) and medical question answering (measured by error rate—the lower, the better) tasks. This figure compares the performance of various LLMs when specific SDOH factors were introduced into the dataset. The SDOH factors considered include race, sex, low income, LGBT+ status, homelessness, illiteracy, disability, and unemployment. Each sensitive attribute was incorporated into the input data for both the CTM and MQA tasks during the evaluation.
Figure 3: Equal Opportunity (EO) and Demographic Parity (DP) are fairness metrics used to assess equity in LLMs anderson2024measuring. Higher scores in EO and DP indicate better equity, with EO focusing on ensuring equal positive outcomes for qualified individuals across groups, and DP evaluating overall equity across all groups.
Figure 4: Correlation heatmaps of inequity categories in the Clinical Trial Matching and MQA tasks. Left: Correlation between inequity categories in the CTM task, illustrating how different inequity-modified queries resulted in similar trial rankings or selections by the models. High correlation coefficients suggest that the model's outputs for these inequity categories are highly aligned, indicating interconnected biases. Right: Correlation between inequity categories in the MQA task, displaying how often different inequity-modified queries led to the same answers or error patterns. These heatmaps help analyze how inequities across categories are interconnected, impacting model fairness across tasks.
Figure 5: Fairness metrics for LLaMA3 8B models. Models with EquityGuard (w/ EquityGuard) show reduced EO and DP differences, indicating enhanced fairness.
...and 2 more figures

Mitigating the Risk of Health Inequity Exacerbated by Large Language Models

TL;DR

Abstract

Mitigating the Risk of Health Inequity Exacerbated by Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)