Table of Contents
Fetching ...

The Impact of AI Explanations on Clinicians Trust and Diagnostic Accuracy in Breast Cancer

Olya Rezaeian, Onur Asan, Alparslan Emrah Bayrak

TL;DR

This study investigates whether varying levels of AI explanations in a breast cancer CDSS affect clinicians' trust and diagnostic accuracy. Employing an interrupted time-series design with 28 clinicians, the experiment exposes participants to four AI explanations levels while diagnosing breast-tissue images. Results show that richer explanations do not consistently improve trust or performance and can degrade understandability and decision efficiency, though AI support generally boosts accuracy relative to baseline. Demographic factors influence self-reported AI familiarity but do not systematically alter behavioral trust or performance, underscoring the need for careful, context-driven design of explainable AI in high-stakes clinical settings.

Abstract

Advances in machine learning have created new opportunities to develop artificial intelligence (AI)-based clinical decision support systems using past clinical data and improve diagnosis decisions in life-threatening illnesses such breast cancer. Providing explanations for AI recommendations is a possible way to address trust and usability issues in black-box AI systems. This paper presents the results of an experiment to assess the impact of varying levels of AI explanations on clinicians' trust and diagnosis accuracy in a breast cancer application and the impact of demographics on the findings. The study includes 28 clinicians with varying medical roles related to breast cancer diagnosis. The results show that increasing levels of explanations do not always improve trust or diagnosis performance. The results also show that while some of the self-reported measures such as AI familiarity depend on gender, age and experience, the behavioral assessments of trust and performance are independent of those variables.

The Impact of AI Explanations on Clinicians Trust and Diagnostic Accuracy in Breast Cancer

TL;DR

This study investigates whether varying levels of AI explanations in a breast cancer CDSS affect clinicians' trust and diagnostic accuracy. Employing an interrupted time-series design with 28 clinicians, the experiment exposes participants to four AI explanations levels while diagnosing breast-tissue images. Results show that richer explanations do not consistently improve trust or performance and can degrade understandability and decision efficiency, though AI support generally boosts accuracy relative to baseline. Demographic factors influence self-reported AI familiarity but do not systematically alter behavioral trust or performance, underscoring the need for careful, context-driven design of explainable AI in high-stakes clinical settings.

Abstract

Advances in machine learning have created new opportunities to develop artificial intelligence (AI)-based clinical decision support systems using past clinical data and improve diagnosis decisions in life-threatening illnesses such breast cancer. Providing explanations for AI recommendations is a possible way to address trust and usability issues in black-box AI systems. This paper presents the results of an experiment to assess the impact of varying levels of AI explanations on clinicians' trust and diagnosis accuracy in a breast cancer application and the impact of demographics on the findings. The study includes 28 clinicians with varying medical roles related to breast cancer diagnosis. The results show that increasing levels of explanations do not always improve trust or diagnosis performance. The results also show that while some of the self-reported measures such as AI familiarity depend on gender, age and experience, the behavioral assessments of trust and performance are independent of those variables.

Paper Structure

This paper contains 24 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Experiment platform
  • Figure 2: Interactive evaluation: Participants providing their input when in disagreement with AI during the diagnostic process
  • Figure 3: Age and experience distribution of participants
  • Figure 4: Distribution of AI-related measures by gender
  • Figure 5: Distribution of AI-related measures by experience
  • ...and 5 more figures