TransGUNet: Transformer Meets Graph-based Skip Connection for Medical Image Segmentation
Ju-Hyeon Nam, Nur Suriza Syazwany, Sang-Chul Lee
TL;DR
TransGUNet tackles the semantic gap in skip connections for medical image segmentation by introducing an attentional cross-scale graph neural network (ACS-GNN) paired with entropy-driven feature selection (EFS) to produce reliable spatial attention. By converting cross-scale features into graphs and applying node-level attention with MRConv, alongside EFS-filtered channels, the approach captures complex anatomical structures with efficient global and local fusion. Empirical results across six seen and eight unseen datasets show TransGUNet achieves state-of-the-art performance with about 25 million parameters and around 10 GFLOPs, outperforming transformer- and CNN-based skip-connections while maintaining efficiency. The work offers a robust, generalizable, and memory-conscious framework that could improve clinical segmentation tasks across modalities, with potential for deployment in real-world healthcare settings.
Abstract
Skip connection engineering is primarily employed to address the semantic gap between the encoder and decoder, while also integrating global dependencies to understand the relationships among complex anatomical structures in medical image segmentation. Although several models have proposed transformer-based approaches to incorporate global dependencies within skip connections, they often face limitations in capturing detailed local features with high computational complexity. In contrast, graph neural networks (GNNs) exploit graph structures to effectively capture local and global features. Leveraging these properties, we introduce an attentional cross-scale graph neural network (ACS-GNN), which enhances the skip connection framework by converting cross-scale feature maps into a graph structure and capturing complex anatomical structures through node attention. Additionally, we observed that deep learning models often produce uninformative feature maps, which degrades the quality of spatial attention maps. To address this problem, we integrated entropy-driven feature selection (EFS) with spatial attention, calculating an entropy score for each channel and filtering out high-entropy feature maps. Our innovative framework, TransGUNet, comprises ACS-GNN and EFS-based spatial attentio} to effectively enhance domain generalizability across various modalities by leveraging GNNs alongside a reliable spatial attention map, ensuring more robust features within the skip connection. Through comprehensive experiments and analysis, TransGUNet achieved superior segmentation performance on six seen and eight unseen datasets, demonstrating significantly higher efficiency compared to previous methods.
