Table of Contents
Fetching ...

Leveraging Expert Input for Robust and Explainable AI-Assisted Lung Cancer Detection in Chest X-rays

Amy Rafferty, Rishi Ramaesh, Ajitha Rajan

TL;DR

The paper addresses the challenge of deploying AI for lung cancer detection in chest X-rays by evaluating existing XAI approaches and robustness methods. It reveals that post-hoc image-based and text-based XAI techniques often fail to provide clinically meaningful explanations and that expert input is crucial for clinical relevance. The authors introduce ClinicXAI, an expert-driven Concept Bottleneck Model that uses radiologist-curated concepts to produce interpretable explanations while preserving high diagnostic accuracy and robustness to adversarial attacks. ClinicXAI achieves superior concept reliability and radiologist-assessed clinical utility, as well as improved resistance to adversarial perturbations, demonstrating the practical value of domain-guided interpretable AI in healthcare. The work highlights a path toward more trustworthy AI systems in medical diagnostics through close collaboration with clinicians and domain-specific concept definitions.

Abstract

Deep learning models show significant potential for advancing AI-assisted medical diagnostics, particularly in detecting lung cancer through medical image modalities such as chest X-rays. However, the black-box nature of these models poses challenges to their interpretability and trustworthiness, limiting their adoption in clinical practice. This study examines both the interpretability and robustness of a high-performing lung cancer detection model based on InceptionV3, utilizing a public dataset of chest X-rays and radiological reports. We evaluate the clinical utility of multiple explainable AI (XAI) techniques, including both post-hoc and ante-hoc approaches, and find that existing methods often fail to provide clinically relevant explanations, displaying inconsistencies and divergence from expert radiologist assessments. To address these limitations, we collaborated with a radiologist to define diagnosis-specific clinical concepts and developed ClinicXAI, an expert-driven approach leveraging the concept bottleneck methodology. ClinicXAI generated clinically meaningful explanations which closely aligned with the practical requirements of clinicians while maintaining high diagnostic accuracy. We also assess the robustness of ClinicXAI in comparison to the original InceptionV3 model by subjecting both to a series of widely utilized adversarial attacks. Our analysis demonstrates that ClinicXAI exhibits significantly greater resilience to adversarial perturbations. These findings underscore the importance of incorporating domain expertise into the design of interpretable and robust AI systems for medical diagnostics, paving the way for more trustworthy and effective AI solutions in healthcare.

Leveraging Expert Input for Robust and Explainable AI-Assisted Lung Cancer Detection in Chest X-rays

TL;DR

The paper addresses the challenge of deploying AI for lung cancer detection in chest X-rays by evaluating existing XAI approaches and robustness methods. It reveals that post-hoc image-based and text-based XAI techniques often fail to provide clinically meaningful explanations and that expert input is crucial for clinical relevance. The authors introduce ClinicXAI, an expert-driven Concept Bottleneck Model that uses radiologist-curated concepts to produce interpretable explanations while preserving high diagnostic accuracy and robustness to adversarial attacks. ClinicXAI achieves superior concept reliability and radiologist-assessed clinical utility, as well as improved resistance to adversarial perturbations, demonstrating the practical value of domain-guided interpretable AI in healthcare. The work highlights a path toward more trustworthy AI systems in medical diagnostics through close collaboration with clinicians and domain-specific concept definitions.

Abstract

Deep learning models show significant potential for advancing AI-assisted medical diagnostics, particularly in detecting lung cancer through medical image modalities such as chest X-rays. However, the black-box nature of these models poses challenges to their interpretability and trustworthiness, limiting their adoption in clinical practice. This study examines both the interpretability and robustness of a high-performing lung cancer detection model based on InceptionV3, utilizing a public dataset of chest X-rays and radiological reports. We evaluate the clinical utility of multiple explainable AI (XAI) techniques, including both post-hoc and ante-hoc approaches, and find that existing methods often fail to provide clinically relevant explanations, displaying inconsistencies and divergence from expert radiologist assessments. To address these limitations, we collaborated with a radiologist to define diagnosis-specific clinical concepts and developed ClinicXAI, an expert-driven approach leveraging the concept bottleneck methodology. ClinicXAI generated clinically meaningful explanations which closely aligned with the practical requirements of clinicians while maintaining high diagnostic accuracy. We also assess the robustness of ClinicXAI in comparison to the original InceptionV3 model by subjecting both to a series of widely utilized adversarial attacks. Our analysis demonstrates that ClinicXAI exhibits significantly greater resilience to adversarial perturbations. These findings underscore the importance of incorporating domain expertise into the design of interpretable and robust AI systems for medical diagnostics, paving the way for more trustworthy and effective AI solutions in healthcare.
Paper Structure (21 sections, 7 figures, 3 tables)

This paper contains 21 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Example of a cancerous radiology report from the MIMIC-CXR dataset. Clinical concepts extracted ('Nodule') are highlighted by bounding box. Note the negative mention in the final paragraph is not extracted.
  • Figure 2: For LIME, SHAP and Grad-CAM, (a) shows mean pixel overlap between techniques on the MIMIC-CXR test set. (b) shows mean medical ground truth captured on the VinDr-CXR test set. Each technique is applied post-hoc to InceptionV3, trained with 10-fold cross-validation on the MIMIC-CXR dataset. Error bars are excluded due to negligible size.
  • Figure 3: Explanations generated by each XAI technique for a cancerous chest X-ray. (a) shows the ground truth hilar mass. (b) shows LIME (most important = intense green). (c) shows SHAP (most important = green). (d) shows Grad-CAM (most important = red). (e) shows XCBs (concepts with the 5 highest absolute values, positive (+) or negative (-)). (f) shows the radiology report generated by CXR-LLaVA. (g) shows our expert-driven CBM, ClinicXAI, with the top 2 scoring concepts.
  • Figure 4: Analysis by an expert radiologist of explanations generated for a subset of 40 cancerous (a) and 20 healthy (b) chest X-rays for each of the XAI techniques evaluated in this study. The expert was asked to score explanations between 0 and 3 based on their clinical relevance (see legend).
  • Figure 5: Inference pipeline for ClinicXAI takes a chest X-ray as input, which is fed into a trained concept prediction model, producing prediction scores for a pre-set list of clinical concepts. These scores are then input to a trained label prediction model, which outputs the binary classification label.
  • ...and 2 more figures