EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
Md Mostafijur Rahman, Mustafa Munir, Radu Marculescu
TL;DR
EMCAD introduces a compact, efficient multi-scale convolutional attention decoder for 2D medical image segmentation that combines MSCAM and LGAG to refine multi-stage features with depth-wise, multi-scale convolutions. By integrating with lightweight and standard vision encoders (e.g., PVTv2-B0/B2), EMCAD achieves state-of-the-art performance across 12 datasets and six tasks while dramatically reducing parameter count and FLOPs through depth-wise operations and large-kernel grouped attention. The MUTATION loss aggregates predictions from all decoder stages to guide training, contributing to robust multi-scale predictions. Overall, EMCAD provides a practical, scalable decoding solution that maintains high accuracy and generalizes across encoders and segmentation tasks, making it suitable for resource-constrained clinical settings.
Abstract
An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at https://github.com/SLDGroup/EMCAD.
