PAM-UNet: Shifting Attention on Region of Interest in Medical Images
Abhijit Das, Debesh Jha, Vandan Gorade, Koushik Biswas, Hongyi Pan, Zheyuan Zhang, Daniela P. Ladner, Yury Velichko, Amir Borhani, Ulas Bagci
TL;DR
PAM-UNet tackles the challenge of accurate medical image segmentation on resource-constrained devices by integrating depthwise separable convolutions with a novel Progressive Luong Attention (PLA) mechanism within a U-shaped, mobile backbone. PLA progressively aggregates long-range dependencies from shallow features into the decoder, guided by encoder residuals, and is reinforced by an Attention Regularization loss to avoid over-attention. Empirical results on LiTS-2017 and Kvasir-SEG show PAM-UNet achieving state-competitive Dice and mIoU scores while maintaining a very low computational burden (around $1.32$ FLOPS), outperforming several backbone- and attention-based baselines. The work also provides mechanistic insights via CKA analyses and ablations that underscore PLA as the primary driver of performance gains, demonstrating its suitability for efficient, edge-capable medical image segmentation with potential for future 3D and Transformer-integrated extensions.
Abstract
Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To address this limitation, we propose a novel \underline{P}rogressive \underline{A}ttention based \underline{M}obile \underline{UNet} (\underline{PAM-UNet}) architecture. The inverted residual (IR) blocks in PAM-UNet help maintain a lightweight framework, while layerwise \textit{Progressive Luong Attention} ($\mathcal{PLA}$) promotes precise segmentation by directing attention toward regions of interest during synthesis. Our approach prioritizes both accuracy and speed, achieving a commendable balance with a mean IoU of 74.65 and a dice score of 82.87, while requiring only 1.32 floating-point operations per second (FLOPS) on the Liver Tumor Segmentation Benchmark (LiTS) 2017 dataset. These results highlight the importance of developing efficient segmentation models to accelerate the adoption of AI in clinical practice.
