Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare

Pardis Sadat Zahraei; Zahra Shakeri

Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare

Pardis Sadat Zahraei, Zahra Shakeri

TL;DR

The EthiClinician is developed, a fine-tuned model built on the ChatDoctor framework, which outperforms GPT-4 in both ethical reasoning and clinical judgment and sets a new benchmark for safer, more reliable patient outcomes.

Abstract

Biased AI-generated medical advice and misdiagnoses can jeopardize patient safety, making the integrity of AI in healthcare more critical than ever. As Large Language Models (LLMs) take on a growing role in medical decision-making, addressing their biases and enhancing their accuracy is key to delivering safe, reliable care. This study addresses these challenges head-on by introducing new resources designed to promote ethical and precise AI in healthcare. We present two datasets: BiasMD, featuring 6,007 question-answer pairs crafted to evaluate and mitigate biases in health-related LLM outputs, and DiseaseMatcher, with 32,000 clinical question-answer pairs spanning 700 diseases, aimed at assessing symptom-based diagnostic accuracy. Using these datasets, we developed the EthiClinician, a fine-tuned model built on the ChatDoctor framework, which outperforms GPT-4 in both ethical reasoning and clinical judgment. By exposing and correcting hidden biases in existing models for healthcare, our work sets a new benchmark for safer, more reliable patient outcomes.

Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare

TL;DR

Abstract

Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)