Table of Contents
Fetching ...

Leveraging CAM Algorithms for Explaining Medical Semantic Segmentation

Tillmann Rheude, Andreas Wirtz, Arjan Kuijper, Stefan Wesarg

TL;DR

The paper tackles the explainability of CNN-based semantic segmentation, with emphasis on medical imaging where locating decision-relevant regions is crucial. It introduces Seg-HiRes-Grad CAM, a transfer of the classification-based HiRes CAM to segmentation by computing heatmaps from the deepest feature maps using a pixel-set $\mathcal{M}$, i.e., $L^c_{Seg-HiRes-Grad CAM} = sum_k \alpha_c^k A^k$, where $\alpha_c^k = \frac{y^{c, new}}{∂A^k}$ and $y^{c, new} = sum_{i,j \in \mathcal{M}} y^c_{i,j}$. This approach preserves gradient-based weighting while enabling segmentation-specific aggregation, improving saliency localization over Seg-Grad CAM. Evaluations on medical and non-medical datasets show more accurate saliency localization, though runtime and resolution constraints remain; the work motivates applying additional classification-based CAM transfers and richer quantitative comparisons for deeper evaluation of explainability in medical image segmentation.

Abstract

Convolutional neural networks (CNNs) achieve prevailing results in segmentation tasks nowadays and represent the state-of-the-art for image-based analysis. However, the understanding of the accurate decision-making process of a CNN is rather unknown. The research area of explainable artificial intelligence (xAI) primarily revolves around understanding and interpreting this black-box behavior. One way of interpreting a CNN is the use of class activation maps (CAMs) that represent heatmaps to indicate the importance of image areas for the prediction of the CNN. For classification tasks, a variety of CAM algorithms exist. But for segmentation tasks, only one CAM algorithm for the interpretation of the output of a CNN exist. We propose a transfer between existing classification- and segmentation-based methods for more detailed, explainable, and consistent results which show salient pixels in semantic segmentation tasks. The resulting Seg-HiRes-Grad CAM is an extension of the segmentation-based Seg-Grad CAM with the transfer to the classification-based HiRes CAM. Our method improves the previously-mentioned existing segmentation-based method by adjusting it to recently published classification-based methods. Especially for medical image segmentation, this transfer solves existing explainability disadvantages.

Leveraging CAM Algorithms for Explaining Medical Semantic Segmentation

TL;DR

The paper tackles the explainability of CNN-based semantic segmentation, with emphasis on medical imaging where locating decision-relevant regions is crucial. It introduces Seg-HiRes-Grad CAM, a transfer of the classification-based HiRes CAM to segmentation by computing heatmaps from the deepest feature maps using a pixel-set , i.e., , where and . This approach preserves gradient-based weighting while enabling segmentation-specific aggregation, improving saliency localization over Seg-Grad CAM. Evaluations on medical and non-medical datasets show more accurate saliency localization, though runtime and resolution constraints remain; the work motivates applying additional classification-based CAM transfers and richer quantitative comparisons for deeper evaluation of explainability in medical image segmentation.

Abstract

Convolutional neural networks (CNNs) achieve prevailing results in segmentation tasks nowadays and represent the state-of-the-art for image-based analysis. However, the understanding of the accurate decision-making process of a CNN is rather unknown. The research area of explainable artificial intelligence (xAI) primarily revolves around understanding and interpreting this black-box behavior. One way of interpreting a CNN is the use of class activation maps (CAMs) that represent heatmaps to indicate the importance of image areas for the prediction of the CNN. For classification tasks, a variety of CAM algorithms exist. But for segmentation tasks, only one CAM algorithm for the interpretation of the output of a CNN exist. We propose a transfer between existing classification- and segmentation-based methods for more detailed, explainable, and consistent results which show salient pixels in semantic segmentation tasks. The resulting Seg-HiRes-Grad CAM is an extension of the segmentation-based Seg-Grad CAM with the transfer to the classification-based HiRes CAM. Our method improves the previously-mentioned existing segmentation-based method by adjusting it to recently published classification-based methods. Especially for medical image segmentation, this transfer solves existing explainability disadvantages.
Paper Structure (7 sections, 6 equations, 8 figures, 2 tables)

This paper contains 7 sections, 6 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Calculation flow of Seg-Grad CAMvinogradova_towards_2020 (top row) and our proposed Seg-HiRes-Grad CAM (bottom row) based on the computations and flowchart of HiRes CAMdraelos_use_2021 with $K$ feature maps, dimensions $D_1$ and $D_2$ of the feature maps and the average weight values $\alpha_c^k$. Gradients (blue) are multiplied with activation maps (green). The sum (red) of the product (yellow) is upscaled for the final heatmap before ReLU and after ReLU (above red square). On the left, the respective input image and semantic segmentation of the U-Net ronneberger_u-net_2015 is shown. It is striking that the output are very different. ReLU is not visualized to ensure a better comparability.
  • Figure 2: Comparison between Seg-Grad CAMvinogradova_towards_2020(e) and Seg-HiRes-Grad CAM(f). In this case, $\mathcal{M}$ equals the respective pixels of the prediction (c) for the car class (d), which is similar to the ground truth (b). The input image (a) from the Cityscapes dataset cordts_cityscapes_2016 is used since Vinogradova et al. vinogradova_towards_2020 use it.
  • Figure 3: Comparison between Seg-Grad CAMvinogradova_towards_2020(e) and Seg-HiRes-Grad CAM(f) for the upper right wisdom tooth (blue segmentation) (d). In this case, $\mathcal{M}$ equals the respective pixels of the prediction (c, d), which is similar to the ground truth (b). The input image (a) comes from the dataset jader_deep_2018.
  • Figure 4: Comparison between Seg-Grad CAMvinogradova_towards_2020(c, g) and Seg-HiRes-Grad CAM(d, h) for a tumor (b) and kidney (f). The input image comes from the Kits23 dataset Heller_Isensee_et_al_2020.
  • Figure 5: SGCvinogradova_towards_2020(e) and SHRGC(f) for the traffic sign-class cordts_cityscapes_2016.
  • ...and 3 more figures