Dual Cross-Attention for Medical Image Segmentation
Gorkem Can Ates, Prasoon Mohan, Emrah Celik
TL;DR
The paper addresses the persistent semantic gap in U-Net skip-connections for medical image segmentation by introducing Dual Cross-Attention (DCA), a lightweight module that sequentially models channel- and spatial-wise dependencies across multi-scale encoder features. DCA comprises Channel Cross-Attention (CCA) and Spatial Cross-Attention (SCA), operating on tokens produced via a simple, parameter-efficient patch embedding that uses 2D average pooling and depthwise convolutions, and connects back to the decoder through upsampling. Across six U-Net-based architectures and five benchmark datasets, DCA yields consistent segmentation gains (up to 2.74% DSC on MoNuSeg) with minimal parameter overhead, validating its effectiveness as a bridge between encoder and decoder. This approach highlights the practical value of cross-scale, sequential attention for medical image segmentation, offering a scalable path to improved accuracy without heavy architectural changes.
Abstract
We propose Dual Cross-Attention (DCA), a simple yet effective attention module that is able to enhance skip-connections in U-Net-based architectures for medical image segmentation. DCA addresses the semantic gap between encoder and decoder features by sequentially capturing channel and spatial dependencies across multi-scale encoder features. First, the Channel Cross-Attention (CCA) extracts global channel-wise dependencies by utilizing cross-attention across channel tokens of multi-scale encoder features. Then, the Spatial Cross-Attention (SCA) module performs cross-attention to capture spatial dependencies across spatial tokens. Finally, these fine-grained encoder features are up-sampled and connected to their corresponding decoder parts to form the skip-connection scheme. Our proposed DCA module can be integrated into any encoder-decoder architecture with skip-connections such as U-Net and its variants. We test our DCA module by integrating it into six U-Net-based architectures such as U-Net, V-Net, R2Unet, ResUnet++, DoubleUnet and MultiResUnet. Our DCA module shows Dice Score improvements up to 2.05% on GlaS, 2.74% on MoNuSeg, 1.37% on CVC-ClinicDB, 1.12% on Kvasir-Seg and 1.44% on Synapse datasets. Our codes are available at: https://github.com/gorkemcanates/Dual-Cross-Attention
