GCA-ResUNet:Image segmentation in medical images using grouped coordinate attention
Jun Ding, Shang Gao
TL;DR
This work tackles the challenge of achieving accurate medical image segmentation without prohibitive computation by introducing GCA-ResUNet, a CNN–Transformer hybrid that preserves CNN efficiency while incorporating global context through Grouped Coordinate Attention. The backbone uses a ResNet-50–based U-Net architecture with GCA modules embedded in residual blocks, enabling joint channel and directional spatial modeling with minimal overhead. Experiments on Synapse and ACDC demonstrate competitive Dice scores ($DSC$ of $86.11\%$ and $92.64\%$, respectively) and improved boundary delineation relative to strong baselines, highlighting robustness on complex anatomy and blurred boundaries. The results suggest that GCA provides a practical path to endow convolutional architectures with global modeling capabilities suitable for resource-constrained clinical deployments, with potential extensions to multi-task and 3D segmentation.
Abstract
Medical image segmentation underpins computer-aided diagnosis and therapy by supporting clinical diagnosis, preoperative planning, and disease monitoring. While U-Net style convolutional neural networks perform well due to their encoder-decoder structures with skip connections, they struggle to capture long-range dependencies. Transformer-based variants address global context but often require heavy computation and large training datasets. This paper proposes GCA-ResUNet, an efficient segmentation network that integrates Grouped Coordinate Attention (GCA) into ResNet-50 residual blocks. GCA uses grouped coordinate modeling to jointly encode global dependencies across channels and spatial locations, strengthening feature representation and boundary delineation while adding minimal parameter and FLOP overhead compared with self-attention. On the Synapse dataset, GCA-ResUNet achieves a Dice score of 86.11%, and on the ACDC dataset, it reaches 92.64%, surpassing several state-of-the-art baselines while maintaining fast inference and favorable computational efficiency. These results indicate that GCA offers a practical way to enhance convolutional architectures with global modeling capability, enabling high-accuracy and resource-efficient medical image segmentation.
