Weakly-supervised segmentation using inherently-explainable classification models and their application to brain tumour classification

Soumick Chatterjee; Hadya Yassin; Florian Dubost; Andreas Nürnberger; Oliver Speck

Weakly-supervised segmentation using inherently-explainable classification models and their application to brain tumour classification

Soumick Chatterjee, Hadya Yassin, Florian Dubost, Andreas Nürnberger, Oliver Speck

TL;DR

This work tackles the challenge of trustworthy, weakly-supervised brain tumour segmentation by introducing inherently explainable Gaussian Process-based CNN backbones (GP-UNet, GP-ShuffleUNet, GP-ReconResNet) that use a Global Pooling mechanism to generate localisation heatmaps driving classification. These heatmaps enable direct, interpretable segmentation from image-level labels, reducing annotation burden while maintaining competitive $F1$-score and near-equivalent diagnostic separability on two brain-tumour datasets, including BraTS 2020. The approach outperforms a strong interpretable baseline (MProtoNet) in tumour-specific segmentation and demonstrates robust handling of class imbalance via comprehensive ROC and PR analyses, with heatmaps aligning with occlusion and guided backpropagation interpretations. Limitations include 2D slice processing and longer training times, but the study highlights a promising path toward trustworthy, efficient clinical decision support, with future work spanning 3D extensions, federated learning, and integration with Vision-Language Models for radiological reporting.

Abstract

Deep learning has demonstrated significant potential in medical imaging; however, the opacity of "black-box" models hinders clinical trust, while segmentation tasks typically necessitate labourious, hard-to-obtain pixel-wise annotations. To address these challenges simultaneously, this paper introduces a framework for three inherently explainable classifiers (GP-UNet, GP-ShuffleUNet, and GP-ReconResNet). By integrating a global pooling mechanism, these networks generate localisation heatmaps that directly influence classification decisions, offering inherent interpretability without relying on potentially unreliable post-hoc methods. These heatmaps are subsequently thresholded to achieve weakly-supervised segmentation, requiring only image-level classification labels for training. Validated on two datasets for multi-class brain tumour classification, the proposed models achieved a peak F1-score of 0.93. For the weakly-supervised segmentation task, a median Dice score of 0.728 (95% CI 0.715-0.739) was recorded. Notably, on a subset of tumour-only images, the best model achieved an accuracy of 98.7%, outperforming state-of-the-art glioma grading binary classifiers. Furthermore, comparative Precision-Recall analysis validated the framework's robustness against severe class imbalance, establishing a direct correlation between diagnostic confidence and segmentation fidelity. These results demonstrate that the proposed framework successfully combines high diagnostic accuracy with essential transparency, offering a promising direction for trustworthy clinical decision support. Code is available on GitHub: https://github.com/soumickmj/GPModels

Weakly-supervised segmentation using inherently-explainable classification models and their application to brain tumour classification

TL;DR

-score and near-equivalent diagnostic separability on two brain-tumour datasets, including BraTS 2020. The approach outperforms a strong interpretable baseline (MProtoNet) in tumour-specific segmentation and demonstrates robust handling of class imbalance via comprehensive ROC and PR analyses, with heatmaps aligning with occlusion and guided backpropagation interpretations. Limitations include 2D slice processing and longer training times, but the study highlights a promising path toward trustworthy, efficient clinical decision support, with future work spanning 3D extensions, federated learning, and integration with Vision-Language Models for radiological reporting.

Abstract

Paper Structure (24 sections, 20 figures, 7 tables)

This paper contains 24 sections, 20 figures, 7 tables.

Introduction
Related Work
Classification Models
Segmentation Models
Combined Models
Motivation and Contributions
Methodology
Network Models
Implementation
Dataset
Augmentation
Evaluation Criteria
Results
Experiments with Dataset #1
Experiments with Dataset #2: BraTS 2020 dataset
...and 9 more sections

Figures (20)

Figure 1: Workflow of the GP-models
Figure 2: The original Network Architecture of the baseline GP-UNet modelDubost.5222017 was modified by changing the up-pooling mechanism from transposed-convolution to interpolation+convolution (Sinc Up-sample method) and changing the output convolution filter number from $2^{5+i}$ to $2^{6+i}$, $i=0, 1, 2$ corresponding to the original depth of the model (depth=3). A dropout layer with a probability value of 0.5 was added to the model during training.
Figure 3: The Network Architecture of the proposed GP-ReconResNet model. When training and testing on the BraTS 2020 dataset, a dropout layer with a probability value of 0.5 instead of 0.2 was used in the model.
Figure 4: The Network Architecture of the proposed GP-ShuffleUNet modelchatterjee2021shuffleunet. The original model was altered by adding a global pooling layer before the fully connected convolution in the training stage transforming the model to a GP-Model, while also adding a dropout layer with a probability value of 0.5.
Figure 5: Example results of the best-performing GP-ReconResNet on the 1st Dataset JunCheng.2017, in a) Axial, b) Sagittal and c) Coronal orientations. 1st column contains the input slices. 2nd column contains the model's prediction known as the raw heatmaps, where the red areas influenced the classification outcome negatively, and the blue areas influenced the classification outcome favourably. 3rd column contains the suppressed heatmaps, where negative values are suppressed to obtain positive attributions only. 4th column contains the network generated final masks = the suppressed heatmap + Otsu thresholding. The mask is then compared to the ground-truth mask, where white indicates true segmentation, blue indicates over-segmentation, and red indicates under-segmentation. 5th column contains the ground truth mask.
...and 15 more figures

Weakly-supervised segmentation using inherently-explainable classification models and their application to brain tumour classification

TL;DR

Abstract

Weakly-supervised segmentation using inherently-explainable classification models and their application to brain tumour classification

Authors

TL;DR

Abstract

Table of Contents

Figures (20)