Table of Contents
Fetching ...

DoubleU-NetPlus: A Novel Attention and Context Guided Dual U-Net with Multi-Scale Residual Feature Fusion Network for Semantic Segmentation of Medical Images

Md. Rayhan Ahmed, Adnan Ferdous Ashrafi, Raihan Uddin Ahmed, Swakkhar Shatabda, A. K. M. Muzahidul Islam, Salekul Islam

TL;DR

This work tackles semantic segmentation of medical images with multi-scale and context-rich regions, where existing U-Net variants struggle to model high-level features and boundaries. It introduces DoubleU-NetPlus, a dual U-Net framework that employs EfficientNetB7 as the encoder, a multi-kernel residual convolution (MKRC) block, and a squeeze-and-excitation–based ASPP (SE-ASPP), augmented by a hybrid triple attention module (TAM) and a triple attention gate (TAG) to refine skip connections with attention-guided residual convolutions. The architecture also features an attention-guided residual design throughout encoding/decoding and a gating mechanism to selectively emphasize relevant regions, enabling deeper networks without sacrificing spatial precision. Evaluations on six public MIS datasets demonstrate state-of-the-art performance in Dice and IoU metrics across diverse modalities, underscoring the method's strong cross-domain applicability, with reported Dice scores such as 85.17% (DRIVE), 99.34% (LUNA), 94.30% (BUSI), 96.40% (CVCclinicDB), 95.76% (2018 DSB), and 97.10% (ISBI 2012).

Abstract

Accurate segmentation of the region of interest in medical images can provide an essential pathway for devising effective treatment plans for life-threatening diseases. It is still challenging for U-Net, and its state-of-the-art variants, such as CE-Net and DoubleU-Net, to effectively model the higher-level output feature maps of the convolutional units of the network mostly due to the presence of various scales of the region of interest, intricacy of context environments, ambiguous boundaries, and multiformity of textures in medical images. In this paper, we exploit multi-contextual features and several attention strategies to increase networks' ability to model discriminative feature representation for more accurate medical image segmentation, and we present a novel dual U-Net-based architecture named DoubleU-NetPlus. The DoubleU-NetPlus incorporates several architectural modifications. In particular, we integrate EfficientNetB7 as the feature encoder module, a newly designed multi-kernel residual convolution module, and an adaptive feature re-calibrating attention-based atrous spatial pyramid pooling module to progressively and precisely accumulate discriminative multi-scale high-level contextual feature maps and emphasize the salient regions. In addition, we introduce a novel triple attention gate module and a hybrid triple attention module to encourage selective modeling of relevant medical image features. Moreover, to mitigate the gradient vanishing issue and incorporate high-resolution features with deeper spatial details, the standard convolution operation is replaced with the attention-guided residual convolution operations, ...

DoubleU-NetPlus: A Novel Attention and Context Guided Dual U-Net with Multi-Scale Residual Feature Fusion Network for Semantic Segmentation of Medical Images

TL;DR

This work tackles semantic segmentation of medical images with multi-scale and context-rich regions, where existing U-Net variants struggle to model high-level features and boundaries. It introduces DoubleU-NetPlus, a dual U-Net framework that employs EfficientNetB7 as the encoder, a multi-kernel residual convolution (MKRC) block, and a squeeze-and-excitation–based ASPP (SE-ASPP), augmented by a hybrid triple attention module (TAM) and a triple attention gate (TAG) to refine skip connections with attention-guided residual convolutions. The architecture also features an attention-guided residual design throughout encoding/decoding and a gating mechanism to selectively emphasize relevant regions, enabling deeper networks without sacrificing spatial precision. Evaluations on six public MIS datasets demonstrate state-of-the-art performance in Dice and IoU metrics across diverse modalities, underscoring the method's strong cross-domain applicability, with reported Dice scores such as 85.17% (DRIVE), 99.34% (LUNA), 94.30% (BUSI), 96.40% (CVCclinicDB), 95.76% (2018 DSB), and 97.10% (ISBI 2012).

Abstract

Accurate segmentation of the region of interest in medical images can provide an essential pathway for devising effective treatment plans for life-threatening diseases. It is still challenging for U-Net, and its state-of-the-art variants, such as CE-Net and DoubleU-Net, to effectively model the higher-level output feature maps of the convolutional units of the network mostly due to the presence of various scales of the region of interest, intricacy of context environments, ambiguous boundaries, and multiformity of textures in medical images. In this paper, we exploit multi-contextual features and several attention strategies to increase networks' ability to model discriminative feature representation for more accurate medical image segmentation, and we present a novel dual U-Net-based architecture named DoubleU-NetPlus. The DoubleU-NetPlus incorporates several architectural modifications. In particular, we integrate EfficientNetB7 as the feature encoder module, a newly designed multi-kernel residual convolution module, and an adaptive feature re-calibrating attention-based atrous spatial pyramid pooling module to progressively and precisely accumulate discriminative multi-scale high-level contextual feature maps and emphasize the salient regions. In addition, we introduce a novel triple attention gate module and a hybrid triple attention module to encourage selective modeling of relevant medical image features. Moreover, to mitigate the gradient vanishing issue and incorporate high-resolution features with deeper spatial details, the standard convolution operation is replaced with the attention-guided residual convolution operations, ...
Paper Structure (32 sections, 7 equations, 9 figures, 4 tables)

This paper contains 32 sections, 7 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Composition of proposed DoubleU-NetPlus architecture.
  • Figure 2: Composition of the AG-residual convolution module.
  • Figure 3: Composition of the Multi-kernel residual convolution (MKRC) module.
  • Figure 4: Composition of the squeeze and excitation-based atrous spatial pyramid pooling (SE-ASPP) module.
  • Figure 5: Composition of the triple attention module (TAM).
  • ...and 4 more figures