Table of Contents
Fetching ...

J-CaPA : Joint Channel and Pyramid Attention Improves Medical Image Segmentation

Marzia Binta Nizam, Marian Zlateva, James Davis

TL;DR

This work proposes a transformer-based architecture that jointly applies Channel Attention and Pyramid Attention mechanisms to improve multi-scale feature extraction and enhance segmentation performance for medical images, demonstrating improved segmentation accuracy for complex anatomical structures, outperforming existing state-of-the-art methods.

Abstract

Medical image segmentation is crucial for diagnosis and treatment planning. Traditional CNN-based models, like U-Net, have shown promising results but struggle to capture long-range dependencies and global context. To address these limitations, we propose a transformer-based architecture that jointly applies Channel Attention and Pyramid Attention mechanisms to improve multi-scale feature extraction and enhance segmentation performance for medical images. Increasing model complexity requires more training data, and we further improve model generalization with CutMix data augmentation. Our approach is evaluated on the Synapse multi-organ segmentation dataset, achieving a 6.9% improvement in Mean Dice score and a 39.9% improvement in Hausdorff Distance (HD95) over an implementation without our enhancements. Our proposed model demonstrates improved segmentation accuracy for complex anatomical structures, outperforming existing state-of-the-art methods.

J-CaPA : Joint Channel and Pyramid Attention Improves Medical Image Segmentation

TL;DR

This work proposes a transformer-based architecture that jointly applies Channel Attention and Pyramid Attention mechanisms to improve multi-scale feature extraction and enhance segmentation performance for medical images, demonstrating improved segmentation accuracy for complex anatomical structures, outperforming existing state-of-the-art methods.

Abstract

Medical image segmentation is crucial for diagnosis and treatment planning. Traditional CNN-based models, like U-Net, have shown promising results but struggle to capture long-range dependencies and global context. To address these limitations, we propose a transformer-based architecture that jointly applies Channel Attention and Pyramid Attention mechanisms to improve multi-scale feature extraction and enhance segmentation performance for medical images. Increasing model complexity requires more training data, and we further improve model generalization with CutMix data augmentation. Our approach is evaluated on the Synapse multi-organ segmentation dataset, achieving a 6.9% improvement in Mean Dice score and a 39.9% improvement in Hausdorff Distance (HD95) over an implementation without our enhancements. Our proposed model demonstrates improved segmentation accuracy for complex anatomical structures, outperforming existing state-of-the-art methods.

Paper Structure

This paper contains 18 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the proposed framework. Input medical images are processed by a CNN-based encoder, followed by Transformer layers and our Joint Attention blocks (combining Pyramid and Channel Attention). Features at multiple scales are refined by Joint Attention and passed through skip connections to the decoder. The decoder performs CNN-based up-sampling to generate high-resolution segmentation maps, capturing detailed anatomical structures.
  • Figure 2: Visual comparison of segmentation results on the Synapse dataset. Each row represents a different case, with the columns showing: (1) the original CT image, (2) ground truth segmentation labels, (3) predictions from Trans-UNet, and (4) predictions from SAMed (5) predictions from our proposed model.