Table of Contents
Fetching ...

Few-Shot Medical Image Segmentation with Large Kernel Attention

Xiaoxiao Wu, Xiaowei Chen, Zhenguo Gao, Shulei Qu, Yuanyuan Qiu

TL;DR

This work tackles the data scarcity challenge in medical image segmentation by presenting a few-shot segmentation model that learns comprehensive feature representations. It introduces a plug-and-play large kernel attention mechanism integrated with a dual-path feature extractor, adaptive prototype predictor, and multi-scale prediction fusion to capture local and long-range information across scales. The method constructs a single class prototype per episode and uses an adaptive threshold to generate segmentation masks, with a multi-scale fusion step improving robustness across organ sizes. Empirical evaluation on CHAOS and CMR demonstrates state-of-the-art mean Dice scores compared to recent baselines, underscoring the practical impact of combining large-kernel attention with multi-scale analysis for few-shot medical image segmentation.

Abstract

Medical image segmentation has witnessed significant advancements with the emergence of deep learning. However, the reliance of most neural network models on a substantial amount of annotated data remains a challenge for medical image segmentation. To address this issue, few-shot segmentation methods based on meta-learning have been employed. Presently, the methods primarily focus on aligning the support set and query set to enhance performance, but this approach hinders further improvement of the model's effectiveness. In this paper, our objective is to propose a few-shot medical segmentation model that acquire comprehensive feature representation capabilities, which will boost segmentation accuracy by capturing both local and long-range features. To achieve this, we introduce a plug-and-play attention module that dynamically enhances both query and support features, thereby improving the representativeness of the extracted features. Our model comprises four key modules: a dual-path feature extractor, an attention module, an adaptive prototype prediction module, and a multi-scale prediction fusion module. Specifically, the dual-path feature extractor acquires multi-scale features by obtaining features of 32{\times}32 size and 64{\times}64 size. The attention module follows the feature extractor and captures local and long-range information. The adaptive prototype prediction module automatically adjusts the anomaly score threshold to predict prototypes, while the multi-scale fusion prediction module integrates prediction masks of various scales to produce the final segmentation result. We conducted experiments on publicly available MRI datasets, namely CHAOS and CMR, and compared our method with other advanced techniques. The results demonstrate that our method achieves state-of-the-art performance.

Few-Shot Medical Image Segmentation with Large Kernel Attention

TL;DR

This work tackles the data scarcity challenge in medical image segmentation by presenting a few-shot segmentation model that learns comprehensive feature representations. It introduces a plug-and-play large kernel attention mechanism integrated with a dual-path feature extractor, adaptive prototype predictor, and multi-scale prediction fusion to capture local and long-range information across scales. The method constructs a single class prototype per episode and uses an adaptive threshold to generate segmentation masks, with a multi-scale fusion step improving robustness across organ sizes. Empirical evaluation on CHAOS and CMR demonstrates state-of-the-art mean Dice scores compared to recent baselines, underscoring the practical impact of combining large-kernel attention with multi-scale analysis for few-shot medical image segmentation.

Abstract

Medical image segmentation has witnessed significant advancements with the emergence of deep learning. However, the reliance of most neural network models on a substantial amount of annotated data remains a challenge for medical image segmentation. To address this issue, few-shot segmentation methods based on meta-learning have been employed. Presently, the methods primarily focus on aligning the support set and query set to enhance performance, but this approach hinders further improvement of the model's effectiveness. In this paper, our objective is to propose a few-shot medical segmentation model that acquire comprehensive feature representation capabilities, which will boost segmentation accuracy by capturing both local and long-range features. To achieve this, we introduce a plug-and-play attention module that dynamically enhances both query and support features, thereby improving the representativeness of the extracted features. Our model comprises four key modules: a dual-path feature extractor, an attention module, an adaptive prototype prediction module, and a multi-scale prediction fusion module. Specifically, the dual-path feature extractor acquires multi-scale features by obtaining features of 32{\times}32 size and 64{\times}64 size. The attention module follows the feature extractor and captures local and long-range information. The adaptive prototype prediction module automatically adjusts the anomaly score threshold to predict prototypes, while the multi-scale fusion prediction module integrates prediction masks of various scales to produce the final segmentation result. We conducted experiments on publicly available MRI datasets, namely CHAOS and CMR, and compared our method with other advanced techniques. The results demonstrate that our method achieves state-of-the-art performance.
Paper Structure (24 sections, 13 equations, 6 figures, 4 tables)

This paper contains 24 sections, 13 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The model we proposed.
  • Figure 2: Illustration of the model. The model integrates feature information from two scales, 64*64 and 32*32. The feature flow of the 64*64 scale is illustrated here. The model consists of three layers: feature extractor and attention layer, adaptive prototype prediction layer, and multi-scale prediction fusion layer. The feature extractor and attention module extract comprehensive feature representations of query and support images. The adaptive prototype prediction layer generates predicted segmentation masks by calculating cosine similarity between foreground prototype and query feature and obtaining an adaptive threshold T. The multi-scale prediction fusion layer upsamples masks of different scales and fuses them to produce the final mask prediction result.
  • Figure 3: Attention module. Where, DW-Conv represents the depth-wise convolution, DW-D-Conv represents the depth-wise-dilation convolution, and 1×1 Conv represents the 1*1 convolution.
  • Figure 4: Comparison of segmentation results from Setting1 on CMR dataset. From left to right: ADNet, Q-Net, CRAPNet, our proposed method and GT. From top to bottom: LV-MYO, LY-BP and RV.
  • Figure 5: Comparison of segmentation results from Setting1 on CHAOS dataset. From left to right: ALPNet, ADNet, Q-Net, CRAPNet, proposed model and GT. From top to bottom: Liver, R.kidney, L.kidney, and Spleen.
  • ...and 1 more figures