Table of Contents
Fetching ...

Learning to Select Like Humans: Explainable Active Learning for Medical Imaging

Ifrat Ikhtear Uddin, Longwei Wang, Xiao Qin, Yang Zhou, KC Santosh

TL;DR

This work proposes an explainability-guided active learning framework that integrates spatial attention alignment into a sample acquisition process and confirms that the models trained by the approach focus on diagnostically relevant regions, demonstrating that incorporating explanation guidance into sample acquisition yields superior data efficiency while maintaining clinical interpretability.

Abstract

Medical image analysis requires substantial labeled data for model training, yet expert annotation is expensive and time-consuming. Active learning (AL) addresses this challenge by strategically selecting the most informative samples for the annotation purpose, but traditional methods solely rely on predictive uncertainty while ignoring whether models learn from clinically meaningful features a critical requirement for clinical deployment. We propose an explainability-guided active learning framework that integrates spatial attention alignment into a sample acquisition process. Our approach advocates for a dual-criterion selection strategy combining: (i) classification uncertainty to identify informative examples, and (ii) attention misalignment with radiologist-defined regions-of-interest (ROIs) to target samples where the model focuses on incorrect features. By measuring misalignment between Grad-CAM attention maps and expert annotations using Dice similarity, our acquisition function judiciously identifies samples that enhance both predictive performance and spatial interpretability. We evaluate the framework using three expert-annotated medical imaging datasets, namely, BraTS (MRI brain tumors), VinDr-CXR (chest X-rays), and SIIM-COVID-19 (chest X-rays). Using only 570 strategically selected samples, our explainability-guided approach consistently outperforms random sampling across all the datasets, achieving 77.22% accuracy on BraTS, 52.37% on VinDr-CXR, and 52.66% on SIIM-COVID. Grad-CAM visualizations confirm that the models trained by our dual-criterion selection focus on diagnostically relevant regions, demonstrating that incorporating explanation guidance into sample acquisition yields superior data efficiency while maintaining clinical interpretability.

Learning to Select Like Humans: Explainable Active Learning for Medical Imaging

TL;DR

This work proposes an explainability-guided active learning framework that integrates spatial attention alignment into a sample acquisition process and confirms that the models trained by the approach focus on diagnostically relevant regions, demonstrating that incorporating explanation guidance into sample acquisition yields superior data efficiency while maintaining clinical interpretability.

Abstract

Medical image analysis requires substantial labeled data for model training, yet expert annotation is expensive and time-consuming. Active learning (AL) addresses this challenge by strategically selecting the most informative samples for the annotation purpose, but traditional methods solely rely on predictive uncertainty while ignoring whether models learn from clinically meaningful features a critical requirement for clinical deployment. We propose an explainability-guided active learning framework that integrates spatial attention alignment into a sample acquisition process. Our approach advocates for a dual-criterion selection strategy combining: (i) classification uncertainty to identify informative examples, and (ii) attention misalignment with radiologist-defined regions-of-interest (ROIs) to target samples where the model focuses on incorrect features. By measuring misalignment between Grad-CAM attention maps and expert annotations using Dice similarity, our acquisition function judiciously identifies samples that enhance both predictive performance and spatial interpretability. We evaluate the framework using three expert-annotated medical imaging datasets, namely, BraTS (MRI brain tumors), VinDr-CXR (chest X-rays), and SIIM-COVID-19 (chest X-rays). Using only 570 strategically selected samples, our explainability-guided approach consistently outperforms random sampling across all the datasets, achieving 77.22% accuracy on BraTS, 52.37% on VinDr-CXR, and 52.66% on SIIM-COVID. Grad-CAM visualizations confirm that the models trained by our dual-criterion selection focus on diagnostically relevant regions, demonstrating that incorporating explanation guidance into sample acquisition yields superior data efficiency while maintaining clinical interpretability.
Paper Structure (22 sections, 4 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 22 sections, 4 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Traditional AL only considers uncertainty. It misses samples where the model is confident but focuses on the wrong features. Our dual-criterion approach catches both failure modes
  • Figure 2: Explainability-Guided Active Learning Framework. The framework operates in an iterative cycle with five key components: (1) Starting with an unlabeled pool $U=\{x_i\}$ where expert annotations are available for selected samples; (2) A composite scoring function that combines classification uncertainty $H(x)$ and explanation misalignment $D_{exp}(x)$; (3) Composite acquisition score $Score(x)$ where $\lambda$ balances uncertainty and misalignment; (4) Selection of top-K samples with highest composite scores, followed by expert annotation to obtain both class labels and diagnostic ROI masks for support and query sets; (5) Model fine-tuning with explanation-guided supervision using the newly acquired expert annotations. This iterative system progressively improves both classification accuracy and explanation quality while minimizing annotation costs.
  • Figure 3: Representative high-scoring samples selected by EG-AL across different failure patterns. Each example shows: input image with acquisition score, expert annotation, and model attention. Top-left: BraTS case (score 1.079) with small tumor exhibiting high uncertainty and severe attention misalignment. Top-right: VinDr-CXR case (score 1.019) with multiple bounding boxes where model attention scatters across irrelevant regions. Bottom row: Two VinDr-CXR cases showing subtle single opacity (left, score 1.065) and multiple distributed abnormalities (right, score 1.011), both with poor spatial alignment. Our dual-criterion scoring systematically identifies samples where models exhibit classification uncertainty, spatial misalignment, or both.
  • Figure 4: Attention alignment from models trained with EG-AL-selected samples. Each triplet shows input image, expert annotation, and Grad-CAM (left to right). Top: BraTS tumor cases with small and irregular boundaries. Middle: VinDr-CXR cases with single and multiple thoracic abnormalities. Bottom: SIIM-COVID cases with varying lung opacities. Models consistently localize expert-defined regions regardless of size, number, or complexity, validating that dual-criterion sample selection produces clinically aligned attention.
  • Figure 5: Progressive performance improvement on BraTS across active learning iterations. Top: Accuracy comparison. Bottom: Macro AUC comparison. Both methods start from the same baseline (iteration 0) and select 60 samples per round. EG-AL demonstrates consistent upward trajectory in both metrics, while random sampling exhibits higher fluctuation in both accuracy (top-left) and macro-auc (bottom-left). Shaded regions indicate standard deviation over 5 random seeds, showing EG-AL's superior stability.
  • ...and 1 more figures