A Survey on Human-Centered Evaluation of Explainable AI Methods in Clinical Decision Support Systems
Alessandro Gambetti, Qiwei Han, Hong Shen, Claudia Soares
TL;DR
The paper addresses the limited systematic evidence on how explainable AI (XAI) explanations influence real-world adoption of clinical decision support systems (CDSS). It conducts a PRISMA-guided survey of 31 human-centered evaluations, revealing a dominance of post-hoc, model-agnostic explanations (e.g., SHAP, Grad-CAM) tested in small clinician samples, with explanations often increasing cognitive load and misaligning with domain reasoning. It contributes a socio-technical, stakeholder-centric evaluation framework and an iterative development protocol to align XAI with diverse stakeholders, aiming to produce trustworthy, clinically viable CDSS. The work has practical significance for guiding future CDSS design, evaluation, and deployment in healthcare, emphasizing augmentation of clinician expertise rather than replacement and highlighting areas for regulatory and workflow considerations.
Abstract
Explainable Artificial Intelligence (XAI) is essential for the transparency and clinical adoption of Clinical Decision Support Systems (CDSS). However, the real-world effectiveness of existing XAI methods remains limited and is inconsistently evaluated. This study conducts a systematic PRISMA-guided survey of 31 human-centered evaluations (HCE) of XAI applied to CDSS, classifying them by XAI methodology, evaluation design, and adoption barrier. Our findings reveal that most existing studies employ post-hoc, model-agnostic approaches such as SHAP and Grad-CAM, typically assessed through small-scale clinician studies. The results show that over 80% of the studies adopt post-hoc, model-agnostic approaches such as SHAP and Grad-CAM, and that clinician sample sizes remain below 25 participants. The findings indicate that explanations generally improve clinician trust and diagnostic confidence, but frequently increase cognitive load and exhibit misalignment with domain reasoning processes. To bridge these gaps, we propose a stakeholder-centric evaluation framework that integrates socio-technical principles and human-computer interaction to guide the future development of clinically viable and trustworthy XAI-based CDSS.
