Table of Contents
Fetching ...

MAProtoNet: A Multi-scale Attentive Interpretable Prototypical Part Network for 3D Magnetic Resonance Imaging Brain Tumor Classification

Binghua Li, Jie Mao, Zhe Sun, Chao Li, Qibin Zhao, Toshihisa Tanaka

TL;DR

MAProtoNet addresses the critical need for precise and interpretable localization in 3D MRI brain-tumor classification. It extends prototypical part networks with a quadruplet attention mechanism and a concise multi-scale module, supervised by a novel multi-scale mapping loss, to produce pixel-level attribution maps without extra segmentation labels. Empirical results on BraTS datasets show consistent activation-precision improvements over baselines, highlighting the value of combining multi-scale spatial–channel interactions with scale-aware supervision. The work advances interpretable medical imaging by delivering stronger localization while maintaining classification performance, and it provides code for reproducibility.

Abstract

Automated diagnosis with artificial intelligence has emerged as a promising area in the realm of medical imaging, while the interpretability of the introduced deep neural networks still remains an urgent concern. Although contemporary works, such as XProtoNet and MProtoNet, has sought to design interpretable prediction models for the issue, the localization precision of their resulting attribution maps can be further improved. To this end, we propose a Multi-scale Attentive Prototypical part Network, termed MAProtoNet, to provide more precise maps for attribution. Specifically, we introduce a concise multi-scale module to merge attentive features from quadruplet attention layers, and produces attribution maps. The proposed quadruplet attention layers can enhance the existing online class activation mapping loss via capturing interactions between the spatial and channel dimension, while the multi-scale module then fuses both fine-grained and coarse-grained information for precise maps generation. We also apply a novel multi-scale mapping loss for supervision on the proposed multi-scale module. Compared to existing interpretable prototypical part networks in medical imaging, MAProtoNet can achieve state-of-the-art performance in localization on brain tumor segmentation (BraTS) datasets, resulting in approximately 4% overall improvement on activation precision score (with a best score of 85.8%), without using additional annotated labels of segmentation. Our code will be released in https://github.com/TUAT-Novice/maprotonet.

MAProtoNet: A Multi-scale Attentive Interpretable Prototypical Part Network for 3D Magnetic Resonance Imaging Brain Tumor Classification

TL;DR

MAProtoNet addresses the critical need for precise and interpretable localization in 3D MRI brain-tumor classification. It extends prototypical part networks with a quadruplet attention mechanism and a concise multi-scale module, supervised by a novel multi-scale mapping loss, to produce pixel-level attribution maps without extra segmentation labels. Empirical results on BraTS datasets show consistent activation-precision improvements over baselines, highlighting the value of combining multi-scale spatial–channel interactions with scale-aware supervision. The work advances interpretable medical imaging by delivering stronger localization while maintaining classification performance, and it provides code for reproducibility.

Abstract

Automated diagnosis with artificial intelligence has emerged as a promising area in the realm of medical imaging, while the interpretability of the introduced deep neural networks still remains an urgent concern. Although contemporary works, such as XProtoNet and MProtoNet, has sought to design interpretable prediction models for the issue, the localization precision of their resulting attribution maps can be further improved. To this end, we propose a Multi-scale Attentive Prototypical part Network, termed MAProtoNet, to provide more precise maps for attribution. Specifically, we introduce a concise multi-scale module to merge attentive features from quadruplet attention layers, and produces attribution maps. The proposed quadruplet attention layers can enhance the existing online class activation mapping loss via capturing interactions between the spatial and channel dimension, while the multi-scale module then fuses both fine-grained and coarse-grained information for precise maps generation. We also apply a novel multi-scale mapping loss for supervision on the proposed multi-scale module. Compared to existing interpretable prototypical part networks in medical imaging, MAProtoNet can achieve state-of-the-art performance in localization on brain tumor segmentation (BraTS) datasets, resulting in approximately 4% overall improvement on activation precision score (with a best score of 85.8%), without using additional annotated labels of segmentation. Our code will be released in https://github.com/TUAT-Novice/maprotonet.
Paper Structure (25 sections, 14 equations, 6 figures, 5 tables)

This paper contains 25 sections, 14 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: To enhance the localization capability of attribution maps, we propose to introduce quadruplet attention and mulit-scale features into prototypical part network.
  • Figure 2: Framework of our proposed MAProtoNet. The backbone $H$ is mainly for feature extraction. Green emphasizes the enhancements of XProtoNet by MProtoNet, whereas blue is used to highlight further improvements by our MAProtoNet. In our architecture, quadruplet attention blocks denoted by $Q$ are introduced, and multi-scale features are applied and fused in a novel multi-scale module for pixel-level maps generation. We would like to emphasize that while we illustrate $n_{scale}=3$ in the figure, we actually set $n_{scale}=2$ in practice due to the shallower backbone.
  • Figure 3: Illustration of the proposed quadruplet attention. Distinct interactions are extracting via four different branches, and are fused by averaging.
  • Figure 4: Architecture of the multi-scale module. As for the coarse-grained features, we down-sample via convolutional or pooling layers, and fuse them by an addition or concatenation operation. Hence, we have architecture (a) $Conv + Concat$; (b) $Conv + Add$; (c) $Pool + Concat$ and (d) $Pool + Add$. Convolutional layers in (a) and (b) are designed to decrease spatial resolution with a $stride$ greater than 1, whereas those in (d) are channel-wise with $kernel\_size=1$, specifically to maintain consistency in the channel dimension.
  • Figure 5: Visualization results of attribution maps. All examples here are from subjects in BraTS 2020 dataset. MRI slices of T1CE modality are reported. Our MAProtoNet showcases superior localization performance as compared to other baseline methods.
  • ...and 1 more figures