Table of Contents
Fetching ...

Explainable Deep Learning for Brain Tumor Classification: Comprehensive Benchmarking with Dual Interpretability and Lightweight Deployment

Md. Mohaiminul Islam, Md. Mofazzal Hossen, Maher Ali Rusho, Nahiyan Nazah Ridita, Zarin Tasnia Shanta, Md. Simanto Haider, Ahmed Faizul Haque Dhrubo, Md. Khurshid Jahan, Mohammad Abdul Qayum

TL;DR

The paper addresses automated brain tumor classification from MRI by benchmarking six architectures (five ImageNet-pretrained CNNs/transformers and a 1.31M-parameter lightweight CNN) within a fully standardized preprocessing and training workflow. It integrates dual explainability (Grad-CAM and GradientShap) to localize anatomically meaningful regions and assesses performance with diverse metrics including IoU, Hausdorff distance, ASD, and confusion matrices, alongside conventional accuracy, precision, recall, and F1-score. The Inception-ResNet V2 model achieves state-of-the-art testing accuracy of 99.53% (with near-perfect precision/recall/F1 and IoU), while the lightweight CNN reaches 96.49% accuracy, enabling real-time edge deployment on resource-constrained devices. The framework demonstrates deployability across clinical settings—from large centers to rural clinics—by balancing accuracy, interpretability, and efficiency, though it acknowledges limitations such as dataset size, lack of external validation, and the need for prospective trials and uncertainty quantification for real-world adoption.

Abstract

Our study provides a full deep learning system for automated classification of brain tumors from MRI images, includes six benchmarked architectures (five ImageNet-pre-trained models (VGG-16, Inception V3, ResNet-50, Inception-ResNet V2, Xception) and a custom built, compact CNN (1.31M params)). The study moves the needle forward in a number of ways, including (1) full standardization of assessment with respect to preprocessing, training sets/protocols (optimizing networks with the AdamW optimizer, CosineAnnealingLR, patiene for early stopping = 7), and metrics to assess performance were identical along all models; (2) a high level of confidence in the localizations based on prior studies as both Grad-CAM and GradientShap explanation were used to establish anatomically important and meaningful attention regions and address the black-box issue; (3) a compact 1.31 million parameter CNN was developed that achieved 96.49% testing accuracy and was 100 times smaller than Inception-ResNet V2 while permitting real-time inference (375ms) on edge devices; (4) full evaluation beyond accuracy reporting based on measures of intersection over union, Hausdorff distance, and precision-recall curves, and confusion matrices across all splits. Inception-ResNet V2 reached state-of-the-art performance, achieving a 99.53% accuracy on testing and obtaining a precision, recall, and F1-score of at least 99.50% dominant performance based on metrics of recent studies. We demonstrated a lightweight model that is suitable to deploy on devices that do not have multi-GPU infrastructure in under-resourced settings. This end-to-end solution considers accuracy, interpretability, and deployability of trustworthy AI to create the framework necessary for performance assessment and deployment within advance and low-resource healthcare systems to an extent that enabled participation at the clinical screening and triage level.

Explainable Deep Learning for Brain Tumor Classification: Comprehensive Benchmarking with Dual Interpretability and Lightweight Deployment

TL;DR

The paper addresses automated brain tumor classification from MRI by benchmarking six architectures (five ImageNet-pretrained CNNs/transformers and a 1.31M-parameter lightweight CNN) within a fully standardized preprocessing and training workflow. It integrates dual explainability (Grad-CAM and GradientShap) to localize anatomically meaningful regions and assesses performance with diverse metrics including IoU, Hausdorff distance, ASD, and confusion matrices, alongside conventional accuracy, precision, recall, and F1-score. The Inception-ResNet V2 model achieves state-of-the-art testing accuracy of 99.53% (with near-perfect precision/recall/F1 and IoU), while the lightweight CNN reaches 96.49% accuracy, enabling real-time edge deployment on resource-constrained devices. The framework demonstrates deployability across clinical settings—from large centers to rural clinics—by balancing accuracy, interpretability, and efficiency, though it acknowledges limitations such as dataset size, lack of external validation, and the need for prospective trials and uncertainty quantification for real-world adoption.

Abstract

Our study provides a full deep learning system for automated classification of brain tumors from MRI images, includes six benchmarked architectures (five ImageNet-pre-trained models (VGG-16, Inception V3, ResNet-50, Inception-ResNet V2, Xception) and a custom built, compact CNN (1.31M params)). The study moves the needle forward in a number of ways, including (1) full standardization of assessment with respect to preprocessing, training sets/protocols (optimizing networks with the AdamW optimizer, CosineAnnealingLR, patiene for early stopping = 7), and metrics to assess performance were identical along all models; (2) a high level of confidence in the localizations based on prior studies as both Grad-CAM and GradientShap explanation were used to establish anatomically important and meaningful attention regions and address the black-box issue; (3) a compact 1.31 million parameter CNN was developed that achieved 96.49% testing accuracy and was 100 times smaller than Inception-ResNet V2 while permitting real-time inference (375ms) on edge devices; (4) full evaluation beyond accuracy reporting based on measures of intersection over union, Hausdorff distance, and precision-recall curves, and confusion matrices across all splits. Inception-ResNet V2 reached state-of-the-art performance, achieving a 99.53% accuracy on testing and obtaining a precision, recall, and F1-score of at least 99.50% dominant performance based on metrics of recent studies. We demonstrated a lightweight model that is suitable to deploy on devices that do not have multi-GPU infrastructure in under-resourced settings. This end-to-end solution considers accuracy, interpretability, and deployability of trustworthy AI to create the framework necessary for performance assessment and deployment within advance and low-resource healthcare systems to an extent that enabled participation at the clinical screening and triage level.

Paper Structure

This paper contains 13 sections, 19 figures, 4 tables.

Figures (19)

  • Figure 1: MRI Scans Showing Various Conditions: Glioma, Meningioma,a Non-Tumor Case and Pituitary Tumor
  • Figure 2: Proposed System Architecture Integrating Five Pre-trained CNN Models with Explainable AI.
  • Figure 3: Architecture of the proposed CNN system design.
  • Figure 4: Customized lightweight CNN-Model performance during training and validation: (a) loss curves and (b) accuracy curves
  • Figure 5: Inception-ResnNet V2 Model performance during training and validation: (a) loss curves and (b) accuracy curves
  • ...and 14 more figures