Table of Contents
Fetching ...

Dual Cross-Attention for Medical Image Segmentation

Gorkem Can Ates, Prasoon Mohan, Emrah Celik

TL;DR

The paper addresses the persistent semantic gap in U-Net skip-connections for medical image segmentation by introducing Dual Cross-Attention (DCA), a lightweight module that sequentially models channel- and spatial-wise dependencies across multi-scale encoder features. DCA comprises Channel Cross-Attention (CCA) and Spatial Cross-Attention (SCA), operating on tokens produced via a simple, parameter-efficient patch embedding that uses 2D average pooling and depthwise convolutions, and connects back to the decoder through upsampling. Across six U-Net-based architectures and five benchmark datasets, DCA yields consistent segmentation gains (up to 2.74% DSC on MoNuSeg) with minimal parameter overhead, validating its effectiveness as a bridge between encoder and decoder. This approach highlights the practical value of cross-scale, sequential attention for medical image segmentation, offering a scalable path to improved accuracy without heavy architectural changes.

Abstract

We propose Dual Cross-Attention (DCA), a simple yet effective attention module that is able to enhance skip-connections in U-Net-based architectures for medical image segmentation. DCA addresses the semantic gap between encoder and decoder features by sequentially capturing channel and spatial dependencies across multi-scale encoder features. First, the Channel Cross-Attention (CCA) extracts global channel-wise dependencies by utilizing cross-attention across channel tokens of multi-scale encoder features. Then, the Spatial Cross-Attention (SCA) module performs cross-attention to capture spatial dependencies across spatial tokens. Finally, these fine-grained encoder features are up-sampled and connected to their corresponding decoder parts to form the skip-connection scheme. Our proposed DCA module can be integrated into any encoder-decoder architecture with skip-connections such as U-Net and its variants. We test our DCA module by integrating it into six U-Net-based architectures such as U-Net, V-Net, R2Unet, ResUnet++, DoubleUnet and MultiResUnet. Our DCA module shows Dice Score improvements up to 2.05% on GlaS, 2.74% on MoNuSeg, 1.37% on CVC-ClinicDB, 1.12% on Kvasir-Seg and 1.44% on Synapse datasets. Our codes are available at: https://github.com/gorkemcanates/Dual-Cross-Attention

Dual Cross-Attention for Medical Image Segmentation

TL;DR

The paper addresses the persistent semantic gap in U-Net skip-connections for medical image segmentation by introducing Dual Cross-Attention (DCA), a lightweight module that sequentially models channel- and spatial-wise dependencies across multi-scale encoder features. DCA comprises Channel Cross-Attention (CCA) and Spatial Cross-Attention (SCA), operating on tokens produced via a simple, parameter-efficient patch embedding that uses 2D average pooling and depthwise convolutions, and connects back to the decoder through upsampling. Across six U-Net-based architectures and five benchmark datasets, DCA yields consistent segmentation gains (up to 2.74% DSC on MoNuSeg) with minimal parameter overhead, validating its effectiveness as a bridge between encoder and decoder. This approach highlights the practical value of cross-scale, sequential attention for medical image segmentation, offering a scalable path to improved accuracy without heavy architectural changes.

Abstract

We propose Dual Cross-Attention (DCA), a simple yet effective attention module that is able to enhance skip-connections in U-Net-based architectures for medical image segmentation. DCA addresses the semantic gap between encoder and decoder features by sequentially capturing channel and spatial dependencies across multi-scale encoder features. First, the Channel Cross-Attention (CCA) extracts global channel-wise dependencies by utilizing cross-attention across channel tokens of multi-scale encoder features. Then, the Spatial Cross-Attention (SCA) module performs cross-attention to capture spatial dependencies across spatial tokens. Finally, these fine-grained encoder features are up-sampled and connected to their corresponding decoder parts to form the skip-connection scheme. Our proposed DCA module can be integrated into any encoder-decoder architecture with skip-connections such as U-Net and its variants. We test our DCA module by integrating it into six U-Net-based architectures such as U-Net, V-Net, R2Unet, ResUnet++, DoubleUnet and MultiResUnet. Our DCA module shows Dice Score improvements up to 2.05% on GlaS, 2.74% on MoNuSeg, 1.37% on CVC-ClinicDB, 1.12% on Kvasir-Seg and 1.44% on Synapse datasets. Our codes are available at: https://github.com/gorkemcanates/Dual-Cross-Attention
Paper Structure (15 sections, 9 equations, 3 figures, 4 tables)

This paper contains 15 sections, 9 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Encoder decoder architecture with our proposed DCA block. DCA block can be integrated into any encoder-decoder architecture with skip connections. It takes multi-scale features from different encoder stages, produces enhanced representations and connects them to their decoder counterparts.
  • Figure 2: Architecture of our proposed DCA block (a)). It consists of b) Channel Cross-Attention and c) Spatial Cross-Attention modules to capture long-range interactions.
  • Figure 3: Visual comparison for plain and DCA integrated models. (a) Glas, b) MoNuSeg, c) CVC-ClinicDB, d) Kvasir-Seg, e) Syanpse)