Table of Contents
Fetching ...

MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

Patrick Wienholt, Christiane Kuhl, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn

TL;DR

By providing explicit, reliable explanations accessible even to non-AI experts, MedicalPatchNet mitigates risks associated with shortcut learning, thus improving clinical trust and contributes to safer, explainable AI-assisted diagnostics across medical imaging domains.

Abstract

Deep neural networks excel in radiological image classification but frequently suffer from poor interpretability, limiting clinical acceptance. We present MedicalPatchNet, an inherently self-explainable architecture for chest X-ray classification that transparently attributes decisions to distinct image regions. MedicalPatchNet splits images into non-overlapping patches, independently classifies each patch, and aggregates predictions, enabling intuitive visualization of each patch's diagnostic contribution without post-hoc techniques. Trained on the CheXpert dataset (223,414 images), MedicalPatchNet matches the classification performance (AUROC 0.907 vs. 0.908) of EfficientNetV2-S, while improving interpretability: MedicalPatchNet demonstrates improved interpretability with higher pathology localization accuracy (mean hit-rate 0.485 vs. 0.376 with Grad-CAM) on the CheXlocalize dataset. By providing explicit, reliable explanations accessible even to non-AI experts, MedicalPatchNet mitigates risks associated with shortcut learning, thus improving clinical trust. Our model is publicly available with reproducible training and inference scripts and contributes to safer, explainable AI-assisted diagnostics across medical imaging domains. We make the code publicly available: https://github.com/TruhnLab/MedicalPatchNet

MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

TL;DR

By providing explicit, reliable explanations accessible even to non-AI experts, MedicalPatchNet mitigates risks associated with shortcut learning, thus improving clinical trust and contributes to safer, explainable AI-assisted diagnostics across medical imaging domains.

Abstract

Deep neural networks excel in radiological image classification but frequently suffer from poor interpretability, limiting clinical acceptance. We present MedicalPatchNet, an inherently self-explainable architecture for chest X-ray classification that transparently attributes decisions to distinct image regions. MedicalPatchNet splits images into non-overlapping patches, independently classifies each patch, and aggregates predictions, enabling intuitive visualization of each patch's diagnostic contribution without post-hoc techniques. Trained on the CheXpert dataset (223,414 images), MedicalPatchNet matches the classification performance (AUROC 0.907 vs. 0.908) of EfficientNetV2-S, while improving interpretability: MedicalPatchNet demonstrates improved interpretability with higher pathology localization accuracy (mean hit-rate 0.485 vs. 0.376 with Grad-CAM) on the CheXlocalize dataset. By providing explicit, reliable explanations accessible even to non-AI experts, MedicalPatchNet mitigates risks associated with shortcut learning, thus improving clinical trust. Our model is publicly available with reproducible training and inference scripts and contributes to safer, explainable AI-assisted diagnostics across medical imaging domains. We make the code publicly available: https://github.com/TruhnLab/MedicalPatchNet

Paper Structure

This paper contains 16 sections, 2 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Evaluation framework for classification and localization. Models are trained on the CheXpert dataset using image-level classification labels only. Evaluation is performed on CheXlocalize, which provides image-level labels and radiologist-annotated pixel-wise segmentation masks (for 10 findings). We report both classification performance (e.g., AUROC) and localization performance. Localization is quantified by (i) hit rate, i.e., the fraction of cases where the pixel with maximum attribution/saliency lies inside the corresponding ground-truth segmentation mask, and (ii) mean Intersection over Union (mIoU) between a binarized localization map and the ground-truth mask.Evaluation framework for classification and localization. Models are trained on the CheXpert dataset using only classification labels. For evaluation, the CheXlocalize dataset is used, which provides ground-truth segmentation masks. We compare both classification performance (e.g., using AUROC) and localization performance. For localization, saliency maps from our proposed MedicalPatchNet are compared against a standard EfficientNet-B0 explained by post-hoc methods (Grad-CAM, Grad-CAM++, and Eigen-CAM). The primary localization metrics, illustrated on the right, are the Hit Rate and the Mean Intersection over Union (mIoU).
  • Figure 2: Initially, the image is divided into patches (a), each patch is independently processedeach independently processed by the samean identical EfficientNetV2-S-B0 (b). The resulting patch logits, i.e., raw pre-sigmoid class scores (c), are averaged (d). After applying the sigmoid activation function, the output (e) provides the final classification results of MedicalPatchNet. Multiplying the raw patch logits by the classification probabilities generates scaled patch logits (f). A saliency map can be derived either from raw patch logits (g) or scaled patch logits (h), illustrating each patch’s contribution to the final decision.
  • Figure 3: The saliency map shown in (a) illustrates the influence of each patch on the final classification of pleural effusion. Patches with large positive logits are shown in red and represent strong evidence supporting the class, patches with large negative logits are shown in blue and represent evidence against the class, and patches visualized in light grey or white have logits close to zero and therefore contribute only minimally to the final decision (effectively "abstaining" from the vote).Red patches "vote" for the classification, while blue patches "vote" against it. The closer the patch is to white, the less influence it has on the classification decision. Image (a) shows a direct visualization of the patch logits from one forward pass. When shifting the image and averaging the generated saliency maps, smoother maps can be produced, as seen in (b), (c), and (d), although this requires more forward passes.
  • Figure 4: Comparison of the Area Under the Receiver Operating Characteristic (AUROC) curves for MedicalPatchNet and EfficientNetV2-S-B0, indicating similar classification performance.On the CheXlocalize dataset Saporta2022CheXlocalize, both models yield mean AUROCs of 0.907 and 0.908, respectively; for the full 14-class CheXpert dataset Irvin2019Chexpert, AUROCs are 0.902 and 0.911, respectively.
  • Figure 5: Representative saliency maps produced by MedicalPatchNet and three post-hoc methods. Each row displays the same chest X-ray for a given pathology together with its ground-truth label ("True" or "False"). Columns compare MedicalPatchNet’s raw patch logits with Grad-CAM, Grad-CAM++, and Eigen-CAM applied to an EfficientNetV2-S-B0 baseline. In the MedicalPatchNet maps, red denotes evidence supporting the class and blue denotes evidence against it, whereas Grad-CAM–based maps visualize only positive (red) contributions. Eigen-CAM is class-agnostic and therefore does not generate class-specific saliency maps. Interestingly, for the wrongly diagnosed pneumothorax, all four explainability methods point to the chest tube, revealing that the model used a shortcut, with MedicalPatchnet denoting its course most clearly.
  • ...and 8 more figures