DCAU-Net: Differential Cross Attention and Channel-Spatial Feature Fusion for Medical Image Segmentation

Yanxin Li; Hui Wan; Libin Lan

DCAU-Net: Differential Cross Attention and Channel-Spatial Feature Fusion for Medical Image Segmentation

Yanxin Li, Hui Wan, Libin Lan

TL;DR

DCAU-Net is proposed, a novel yet efficient segmentation framework with two key ideas, designed to compute the difference between two independent softmax attention maps to adaptively highlight discriminative structures and introduce a Channel-Spatial Feature Fusion strategy to adaptively recalibrate features from skip connections and up-sampling paths through using sequential channel and spatial attention.

Abstract

Accurate medical image segmentation requires effective modeling of both long-range dependencies and fine-grained boundary details. While transformers mitigate the issue of insufficient semantic information arising from the limited receptive field inherent in convolutional neural networks, they introduce new challenges: standard self-attention incurs quadratic computational complexity and often assigns non-negligible attention weights to irrelevant regions, diluting focus on discriminative structures and ultimately compromising segmentation accuracy. Existing attention variants, although effective in reducing computational complexity, fail to suppress redundant computation and inadvertently impair global context modeling. Furthermore, conventional fusion strategies in encoder-decoder architectures, typically based on simple concatenation or summation, can not adaptively integrate high-level semantic information with low-level spatial details. To address these limitations, we propose DCAU-Net, a novel yet efficient segmentation framework with two key ideas. First, a new Differential Cross Attention (DCA) is designed to compute the difference between two independent softmax attention maps to adaptively highlight discriminative structures. By replacing pixel-wise key and value tokens with window-level summary tokens, DCA dramatically reduces computational complexity without sacrificing precision. Second, a Channel-Spatial Feature Fusion (CSFF) strategy is introduced to adaptively recalibrate features from skip connections and up-sampling paths through using sequential channel and spatial attention, effectively suppressing redundant information and amplifying salient cues. Experiments on two public benchmarks demonstrate that DCAU-Net achieves competitive performance with enhanced segmentation accuracy and robustness.

DCAU-Net: Differential Cross Attention and Channel-Spatial Feature Fusion for Medical Image Segmentation

TL;DR

Abstract

Paper Structure (16 sections, 11 equations, 4 figures, 5 tables)

This paper contains 16 sections, 11 equations, 4 figures, 5 tables.

Introduction
Method
Differential Cross Attention
Differential Cross Attention Block
Channel-Spatial Feature Fusion Block
Overall Architecture
Experiments
Datasets
Implementation Details
Comparison on Synapse Dataset
Comparison on ACDC Dataset
Ablation Study
Effectiveness of Pre-trained Weights
Effectiveness of Different Attention and $\lambda$ Initialization Strategy in DCA
Effectiveness of CSFF
...and 1 more sections

Figures (4)

Figure 1: Details of the differential cross attention. It performs efficient cross attention between pixel-wise queries and window-level key–value pairs via differential attention, suppressing redundancy and enhancing focus on discriminative structures.
Figure 2: Details of the DCA Block, consisting of a 3$\times$3 depth-wise convolution, a DCA module, and a 2-layer MLP.
Figure 3: Overall architecture of the proposed DCAU-Net. The network adopts a U-shaped encoder-decoder framework with four hierarchical stages. The encoder integrates DCA blocks, centered on the differential cross attention. Features from the encoder are transferred to the decoder via skip connections and adaptively fused with those from previous decoder layers through CSFF blocks to enhance segmentation accuracy.
Figure 4: Qualitative comparisons of our approach against other state-of-the-art methods on the Synapse dataset.

DCAU-Net: Differential Cross Attention and Channel-Spatial Feature Fusion for Medical Image Segmentation

TL;DR

Abstract

DCAU-Net: Differential Cross Attention and Channel-Spatial Feature Fusion for Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)