Table of Contents
Fetching ...

EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation

Md Mostafijur Rahman, Mustafa Munir, Radu Marculescu

TL;DR

EMCAD introduces a compact, efficient multi-scale convolutional attention decoder for 2D medical image segmentation that combines MSCAM and LGAG to refine multi-stage features with depth-wise, multi-scale convolutions. By integrating with lightweight and standard vision encoders (e.g., PVTv2-B0/B2), EMCAD achieves state-of-the-art performance across 12 datasets and six tasks while dramatically reducing parameter count and FLOPs through depth-wise operations and large-kernel grouped attention. The MUTATION loss aggregates predictions from all decoder stages to guide training, contributing to robust multi-scale predictions. Overall, EMCAD provides a practical, scalable decoding solution that maintains high accuracy and generalizes across encoders and segmentation tasks, making it suitable for resource-constrained clinical settings.

Abstract

An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at https://github.com/SLDGroup/EMCAD.

EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation

TL;DR

EMCAD introduces a compact, efficient multi-scale convolutional attention decoder for 2D medical image segmentation that combines MSCAM and LGAG to refine multi-stage features with depth-wise, multi-scale convolutions. By integrating with lightweight and standard vision encoders (e.g., PVTv2-B0/B2), EMCAD achieves state-of-the-art performance across 12 datasets and six tasks while dramatically reducing parameter count and FLOPs through depth-wise operations and large-kernel grouped attention. The MUTATION loss aggregates predictions from all decoder stages to guide training, contributing to robust multi-scale predictions. Overall, EMCAD provides a practical, scalable decoding solution that maintains high accuracy and generalizes across encoders and segmentation tasks, making it suitable for resource-constrained clinical settings.

Abstract

An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at https://github.com/SLDGroup/EMCAD.
Paper Structure (33 sections, 14 equations, 5 figures, 11 tables)

This paper contains 33 sections, 14 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Average DICE scores vs. #FLOPs for different methods over 10 binary medical image segmentation datasets. As shown, our approaches (PVT-EMCAD-B0 and PVT-EMCAD-B2) have the lowest #FLOPs, yet the highest DICE scores.
  • Figure 2: Hierarchical encoder with newly proposed EMCAD decoder architecture. (a) CNN or transformer encoder with four hierarchical stages, (b) EMCAD decoder, (c) Efficient up-convolution block (EUCB), (d) Multi-scale convolutional attention module (MSCAM), (e) Multi-scale convolution block (MSCB), (f) Multi-scale (parallel) depth-wise convolution (MSDC), (g) Large-kernel grouped attention gate (LGAG), (h) Channel attention block (CAB), and (i) Spatial attention block (SAB). X1, X2, X3, and X4 are the features from the four stages of the hierarchical encoder. p1, p2, p3, and p4 are output segmentation maps from four stages of our decoder.
  • Figure 3: Average DICE scores vs. #Params for different methods over 10 binary medical image segmentation datasets. As shown, our proposed approaches (PVT-EMCAD-B0 and PVT-EMCAD-B2) have the fewest parameters, yet the highest DICE scores.
  • Figure 4: Qualitative results of multi-organ segmentation on Synapse Multi-organ dataset. The red rectangular box highlights incorrectly segmented organs by SOTA methods.
  • Figure 5: Qualitative results of polyp segmentation. The red rectangular box highlights incorrectly segmented polyps by SOTA methods.