Table of Contents
Fetching ...

TransGUNet: Transformer Meets Graph-based Skip Connection for Medical Image Segmentation

Ju-Hyeon Nam, Nur Suriza Syazwany, Sang-Chul Lee

TL;DR

TransGUNet tackles the semantic gap in skip connections for medical image segmentation by introducing an attentional cross-scale graph neural network (ACS-GNN) paired with entropy-driven feature selection (EFS) to produce reliable spatial attention. By converting cross-scale features into graphs and applying node-level attention with MRConv, alongside EFS-filtered channels, the approach captures complex anatomical structures with efficient global and local fusion. Empirical results across six seen and eight unseen datasets show TransGUNet achieves state-of-the-art performance with about 25 million parameters and around 10 GFLOPs, outperforming transformer- and CNN-based skip-connections while maintaining efficiency. The work offers a robust, generalizable, and memory-conscious framework that could improve clinical segmentation tasks across modalities, with potential for deployment in real-world healthcare settings.

Abstract

Skip connection engineering is primarily employed to address the semantic gap between the encoder and decoder, while also integrating global dependencies to understand the relationships among complex anatomical structures in medical image segmentation. Although several models have proposed transformer-based approaches to incorporate global dependencies within skip connections, they often face limitations in capturing detailed local features with high computational complexity. In contrast, graph neural networks (GNNs) exploit graph structures to effectively capture local and global features. Leveraging these properties, we introduce an attentional cross-scale graph neural network (ACS-GNN), which enhances the skip connection framework by converting cross-scale feature maps into a graph structure and capturing complex anatomical structures through node attention. Additionally, we observed that deep learning models often produce uninformative feature maps, which degrades the quality of spatial attention maps. To address this problem, we integrated entropy-driven feature selection (EFS) with spatial attention, calculating an entropy score for each channel and filtering out high-entropy feature maps. Our innovative framework, TransGUNet, comprises ACS-GNN and EFS-based spatial attentio} to effectively enhance domain generalizability across various modalities by leveraging GNNs alongside a reliable spatial attention map, ensuring more robust features within the skip connection. Through comprehensive experiments and analysis, TransGUNet achieved superior segmentation performance on six seen and eight unseen datasets, demonstrating significantly higher efficiency compared to previous methods.

TransGUNet: Transformer Meets Graph-based Skip Connection for Medical Image Segmentation

TL;DR

TransGUNet tackles the semantic gap in skip connections for medical image segmentation by introducing an attentional cross-scale graph neural network (ACS-GNN) paired with entropy-driven feature selection (EFS) to produce reliable spatial attention. By converting cross-scale features into graphs and applying node-level attention with MRConv, alongside EFS-filtered channels, the approach captures complex anatomical structures with efficient global and local fusion. Empirical results across six seen and eight unseen datasets show TransGUNet achieves state-of-the-art performance with about 25 million parameters and around 10 GFLOPs, outperforming transformer- and CNN-based skip-connections while maintaining efficiency. The work offers a robust, generalizable, and memory-conscious framework that could improve clinical segmentation tasks across modalities, with potential for deployment in real-world healthcare settings.

Abstract

Skip connection engineering is primarily employed to address the semantic gap between the encoder and decoder, while also integrating global dependencies to understand the relationships among complex anatomical structures in medical image segmentation. Although several models have proposed transformer-based approaches to incorporate global dependencies within skip connections, they often face limitations in capturing detailed local features with high computational complexity. In contrast, graph neural networks (GNNs) exploit graph structures to effectively capture local and global features. Leveraging these properties, we introduce an attentional cross-scale graph neural network (ACS-GNN), which enhances the skip connection framework by converting cross-scale feature maps into a graph structure and capturing complex anatomical structures through node attention. Additionally, we observed that deep learning models often produce uninformative feature maps, which degrades the quality of spatial attention maps. To address this problem, we integrated entropy-driven feature selection (EFS) with spatial attention, calculating an entropy score for each channel and filtering out high-entropy feature maps. Our innovative framework, TransGUNet, comprises ACS-GNN and EFS-based spatial attentio} to effectively enhance domain generalizability across various modalities by leveraging GNNs alongside a reliable spatial attention map, ensuring more robust features within the skip connection. Through comprehensive experiments and analysis, TransGUNet achieved superior segmentation performance on six seen and eight unseen datasets, demonstrating significantly higher efficiency compared to previous methods.

Paper Structure

This paper contains 22 sections, 7 equations, 12 figures, 18 tables.

Figures (12)

  • Figure 1: Graph Visualization of TransGUNet. We selected three patches (Red, Blue, Green) and found the five nearest patches for each, based on the adjacency matrix of ACS-GNN, connecting them with lines to visualize the relationships using each color. This figure reveals that lesion patches exhibit high similarity with other lesion patches, while non-lesion patches similarly cluster together. This result demonstrates the model’s effectiveness in distinguishing lesion from non-lesion regions and maintaining strong intra-class correlations.
  • Figure 2: (a) The overall architecture of the proposed TransGUNet mainly comprises ACS-GNN and EFS-based spatial attention (See Fig. \ref{['fig:ESA_based_spatial_attention']}). (b) The proposed novel skip connection framework. In this figure, we set the target resolution as $(H_{t}, W_{t}) = (\frac{H}{8}, \frac{W}{8})$. And, for convenience, we assume that $C = 4C_{r}$. (c) Overall block diagram of the proposed ACS-GNN. (d) Notation description used in this paper. This notation is also used in Fig. \ref{['fig:ESA_based_spatial_attention']}.
  • Figure 3: The overall block diagram of the Entropy-driven Feature Selection with spatial attention. Low and high transparency red indicates a high and low entropy score, respectively. We produce the spatial attention map using the indices corresponding to the $M$ channels with the lowest entropy scores, called Bottom-$M$ index.
  • Figure 4: (a) Input image, (b) Feature map from ACS-GNN, (c) Entropy map according to each feature map. We calculated entropy map using Shannon Entropy at pixel level.
  • Figure 5: Comparison of parameters (M), FLOPs (G), inference speed (ms), and required GPU memory (G) vs segmentation performance (DSC) on average for all unseen datasets.
  • ...and 7 more figures