Towards Explainable Conversational AI for Early Diagnosis with Large Language Models

Maliha Tabassum; M Shamim Kaiser

Towards Explainable Conversational AI for Early Diagnosis with Large Language Models

Maliha Tabassum, M Shamim Kaiser

TL;DR

This work presents an explainable, interactive diagnostic chatbot built on GPT-4o and Retrieval-Augmented Generation to address diagnostic delays and limited access to specialists. By combining a RAG knowledge base of 14 diseases with adaptive questioning, symptom tracking, and chain-of-thought-inspired reasoning, the system achieves 90.3% Top-1 accuracy and 100% Top-3 accuracy, outperforming classical ML baselines. The authors implement layered explainability, real-time state updates, and evidence-linked final diagnoses to enhance clinician trust, while deploying bias and hallucination controls. The approach is particularly relevant for low-resource settings, offering accessible, transparent, and clinically actionable diagnostic support with potential integration into electronic health records and multilingual deployments.

Abstract

Healthcare systems around the world are grappling with issues like inefficient diagnostics, rising costs, and limited access to specialists. These problems often lead to delays in treatment and poor health outcomes. Most current AI and deep learning diagnostic systems are not very interactive or transparent, making them less effective in real-world, patient-centered environments. This research introduces a diagnostic chatbot powered by a Large Language Model (LLM), using GPT-4o, Retrieval-Augmented Generation, and explainable AI techniques. The chatbot engages patients in a dynamic conversation, helping to extract and normalize symptoms while prioritizing potential diagnoses through similarity matching and adaptive questioning. With Chain-of-Thought prompting, the system also offers more transparent reasoning behind its diagnoses. When tested against traditional machine learning models like Naive Bayes, Logistic Regression, SVM, Random Forest, and KNN, the LLM-based system delivered impressive results, achieving an accuracy of 90% and Top-3 accuracy of 100%. These findings offer a promising outlook for more transparent, interactive, and clinically relevant AI in healthcare.

Towards Explainable Conversational AI for Early Diagnosis with Large Language Models

TL;DR

Abstract

Towards Explainable Conversational AI for Early Diagnosis with Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)