Table of Contents
Fetching ...

Generalizable and Explainable Deep Learning for Medical Image Computing: An Overview

Ahmad Chaddad, Yan Hu, Yihang Wu, Binbin Wen, Reem Kateb

TL;DR

This paper surveys the role of generalizability and explainability in deep learning for medical image analysis, arguing that clinical deployment requires transparent and robust models. It implements four CNN backbones across three public datasets and evaluates five local XAI methods using the ROAD metric, supplemented by paired t-tests and timing analysis to assess both accuracy and explainability efficiency. The findings indicate that XGradCAM and AblationCAM often provide clearer localization of pathological regions and higher confidence gains in several tasks, while methods like EigenGradCAM may underperform in complex skin-cancer cases; LayerCAM and XGradCAM offer favorable trade-offs between speed and interpretability. The work highlights practical implications for clinical adoption and suggests avenues such as hybrid XAI techniques and more robust, diverse benchmarking to advance reliable, generalizable medical imaging solutions.

Abstract

Objective. This paper presents an overview of generalizable and explainable artificial intelligence (XAI) in deep learning (DL) for medical imaging, aimed at addressing the urgent need for transparency and explainability in clinical applications. Methodology. We propose to use four CNNs in three medical datasets (brain tumor, skin cancer, and chest x-ray) for medical image classification tasks. In addition, we perform paired t-tests to show the significance of the differences observed between different methods. Furthermore, we propose to combine ResNet50 with five common XAI techniques to obtain explainable results for model prediction, aiming at improving model transparency. We also involve a quantitative metric (confidence increase) to evaluate the usefulness of XAI techniques. Key findings. The experimental results indicate that ResNet50 can achieve feasible accuracy and F1 score in all datasets (e.g., 86.31\% accuracy in skin cancer). Furthermore, the findings show that while certain XAI methods, such as XgradCAM, effectively highlight relevant abnormal regions in medical images, others, like EigenGradCAM, may perform less effectively in specific scenarios. In addition, XgradCAM indicates higher confidence increase (e.g., 0.12 in glioma tumor) compared to GradCAM++ (0.09) and LayerCAM (0.08). Implications. Based on the experimental results and recent advancements, we outline future research directions to enhance the robustness and generalizability of DL models in the field of biomedical imaging.

Generalizable and Explainable Deep Learning for Medical Image Computing: An Overview

TL;DR

This paper surveys the role of generalizability and explainability in deep learning for medical image analysis, arguing that clinical deployment requires transparent and robust models. It implements four CNN backbones across three public datasets and evaluates five local XAI methods using the ROAD metric, supplemented by paired t-tests and timing analysis to assess both accuracy and explainability efficiency. The findings indicate that XGradCAM and AblationCAM often provide clearer localization of pathological regions and higher confidence gains in several tasks, while methods like EigenGradCAM may underperform in complex skin-cancer cases; LayerCAM and XGradCAM offer favorable trade-offs between speed and interpretability. The work highlights practical implications for clinical adoption and suggests avenues such as hybrid XAI techniques and more robust, diverse benchmarking to advance reliable, generalizable medical imaging solutions.

Abstract

Objective. This paper presents an overview of generalizable and explainable artificial intelligence (XAI) in deep learning (DL) for medical imaging, aimed at addressing the urgent need for transparency and explainability in clinical applications. Methodology. We propose to use four CNNs in three medical datasets (brain tumor, skin cancer, and chest x-ray) for medical image classification tasks. In addition, we perform paired t-tests to show the significance of the differences observed between different methods. Furthermore, we propose to combine ResNet50 with five common XAI techniques to obtain explainable results for model prediction, aiming at improving model transparency. We also involve a quantitative metric (confidence increase) to evaluate the usefulness of XAI techniques. Key findings. The experimental results indicate that ResNet50 can achieve feasible accuracy and F1 score in all datasets (e.g., 86.31\% accuracy in skin cancer). Furthermore, the findings show that while certain XAI methods, such as XgradCAM, effectively highlight relevant abnormal regions in medical images, others, like EigenGradCAM, may perform less effectively in specific scenarios. In addition, XgradCAM indicates higher confidence increase (e.g., 0.12 in glioma tumor) compared to GradCAM++ (0.09) and LayerCAM (0.08). Implications. Based on the experimental results and recent advancements, we outline future research directions to enhance the robustness and generalizability of DL models in the field of biomedical imaging.

Paper Structure

This paper contains 17 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Timeline of recent XAI techniques.
  • Figure 2: Example of XAI impact in clinical decision. Input images to three diagnostic scenarios: (First row) Clinicians use medical images and records for diagnosis, (Second row) AI-assisted diagnosis handles complex data but lacks transparency, making trust difficult. (Third row) Explainable AI (XAI) enhances trust by providing interpretable predictions, combining AI's data processing with clinician expertise for better decisions.
  • Figure 3: Heatmap visualization using five XAI techniques (ResNet50) and the quantitative metrics using ROAD method in brain tumor dataset. The model attention is more focused on its current location when the color is deeper, such as red. Note that the number as shown in each sub-figure is the averaged confidence increase measured using ROAD technique across four thresholds (positive is better).
  • Figure 4: Heatmap visualization using five XAI techniques (ResNet50) and the quantitative metrics using ROAD method in chest x-ray dataset.
  • Figure 5: Heatmap visualization using five XAI techniques (ResNet50) and the quantitative metrics using ROAD method in skin cancer dataset.