Table of Contents
Fetching ...

A Counterfactual LLM Framework for Detecting Human Biases: A Case Study of Sex/Gender in Emergency Triage

Ariel Guerra-Adames, Marta Avalos-Fernandez, Océane Dorémus, Leo Anthony Celi, Cédric Gil-Jardiné, Emmanuel Lagarde

TL;DR

This paper addresses gender bias in emergency triage by introducing a counterfactual LLM framework that emulates human decisions and compares predictions across gender-flipped presentations. It couples a multimodal triage model with a text-and-tabular counterfactual generator and a suite of directional bias metrics, including $PDR$, $DTS$, $NMD$, and $NATS$, to quantify asymmetries in decision-making. The authors validate the approach on Bordeaux CHU and MIMIC-IV, showing a consistent bias where female presentations are more likely to receive less severe triage, with sizable potential national-scale implications (e.g., ~2.1% difference in France). They further demonstrate modality-specific effects, reveal cross-country replication, and discuss pre-training influences, underscoring the framework’s utility for scalable bias audits across domains. Overall, the work establishes a practical, domain-agnostic tool for auditing and addressing inequities in real-world decision-making, beyond emergency care.

Abstract

We present a novel, domain-agnostic counterfactual approach that uses Large Language Models (LLMs) to quantify gender disparities in human clinical decision-making. The method trains an LLM to emulate observed decisions, then evaluates counterfactual pairs in which only gender is flipped, estimating directional disparities while holding all other clinical factors constant. We study emergency triage, validating the approach on more than 150,000 admissions to the Bordeaux University Hospital (France) and replicating results on a subset of MIMIC-IV across a different language, population, and healthcare system. In the Bordeaux cohort, otherwise identical presentations were approximately 2.1% more likely to receive a lower-severity triage score when presented as female rather than male; scaled to national emergency volumes in France, this corresponds to more than 200,000 lower-severity assignments per year. Modality-specific analyses indicate that both explicit tabular gender indicators and implicit textual gender cues contribute to the disparity. Beyond emergency care, the approach supports bias audits in other settings (e.g., hiring, academic, and justice decisions), providing a scalable tool to detect and address inequities in real-world decision-making.

A Counterfactual LLM Framework for Detecting Human Biases: A Case Study of Sex/Gender in Emergency Triage

TL;DR

This paper addresses gender bias in emergency triage by introducing a counterfactual LLM framework that emulates human decisions and compares predictions across gender-flipped presentations. It couples a multimodal triage model with a text-and-tabular counterfactual generator and a suite of directional bias metrics, including , , , and , to quantify asymmetries in decision-making. The authors validate the approach on Bordeaux CHU and MIMIC-IV, showing a consistent bias where female presentations are more likely to receive less severe triage, with sizable potential national-scale implications (e.g., ~2.1% difference in France). They further demonstrate modality-specific effects, reveal cross-country replication, and discuss pre-training influences, underscoring the framework’s utility for scalable bias audits across domains. Overall, the work establishes a practical, domain-agnostic tool for auditing and addressing inequities in real-world decision-making, beyond emergency care.

Abstract

We present a novel, domain-agnostic counterfactual approach that uses Large Language Models (LLMs) to quantify gender disparities in human clinical decision-making. The method trains an LLM to emulate observed decisions, then evaluates counterfactual pairs in which only gender is flipped, estimating directional disparities while holding all other clinical factors constant. We study emergency triage, validating the approach on more than 150,000 admissions to the Bordeaux University Hospital (France) and replicating results on a subset of MIMIC-IV across a different language, population, and healthcare system. In the Bordeaux cohort, otherwise identical presentations were approximately 2.1% more likely to receive a lower-severity triage score when presented as female rather than male; scaled to national emergency volumes in France, this corresponds to more than 200,000 lower-severity assignments per year. Modality-specific analyses indicate that both explicit tabular gender indicators and implicit textual gender cues contribute to the disparity. Beyond emergency care, the approach supports bias audits in other settings (e.g., hiring, academic, and justice decisions), providing a scalable tool to detect and address inequities in real-world decision-making.

Paper Structure

This paper contains 64 sections, 27 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Overview of the counterfactual framework for gender bias detection in emergency triage: (1) LLM fine-tuning for the task of emergency triage, (2) generation of counterfactual pairs according to a sensitive variable like sex/gender, (3) comparison of triage predictions by the fine-tuned LLM on counterfactual pairs.
  • Figure 2: Performance of triage models on the Bordeaux University Hospital ED test set, evaluated with quadratically weighted Cohen’s $\kappa$. Models are ordered by parameter size. The red dashed horizontal line indicates the performance of the RF/TF-IDF baseline.
  • Figure 3: Bias metrics for the CHU de Bordeaux ED and MIMIC-IV datasets.
  • Figure 4: Bias metrics with and without pre-training.
  • Figure 5: Distributions of triage score, patient age and patient gender from the raw dataset (before filtering).