Table of Contents
Fetching ...

Few-Shot Medical Image Segmentation with High-Fidelity Prototypes

Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, Xiatian Zhu

TL;DR

A novel Detail Self-refined Prototype Network (DSPNet) is proposed to constructing high-fidelity prototypes representing the object foreground and the background more comprehensively and integrates channel-specific structural information under sparse channel-aware regulation.

Abstract

Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal for medical imaging with both conditions invalid. To address this problem, we propose a novel Detail Self-refined Prototype Network (DSPNet) to constructing high-fidelity prototypes representing the object foreground and the background more comprehensively. Specifically, to construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modelling the multi-modal structures with clustering and then fusing each in a channel-wise manner. Considering that the background often has no apparent semantic relation in the spatial dimensions, we integrate channel-specific structural information under sparse channel-aware regulation. Extensive experiments on three challenging medical image benchmarks show the superiority of DSPNet over previous state-of-the-art methods.

Few-Shot Medical Image Segmentation with High-Fidelity Prototypes

TL;DR

A novel Detail Self-refined Prototype Network (DSPNet) is proposed to constructing high-fidelity prototypes representing the object foreground and the background more comprehensively and integrates channel-specific structural information under sparse channel-aware regulation.

Abstract

Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal for medical imaging with both conditions invalid. To address this problem, we propose a novel Detail Self-refined Prototype Network (DSPNet) to constructing high-fidelity prototypes representing the object foreground and the background more comprehensively. Specifically, to construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modelling the multi-modal structures with clustering and then fusing each in a channel-wise manner. Considering that the background often has no apparent semantic relation in the spatial dimensions, we integrate channel-specific structural information under sparse channel-aware regulation. Extensive experiments on three challenging medical image benchmarks show the superiority of DSPNet over previous state-of-the-art methods.

Paper Structure

This paper contains 27 sections, 14 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Scheme comparison. For the local information loss problem caused by pooling operation, previous detail discovery scheme incrementally mines new prototypes to capture more details. Our scheme is featured with the design called detail self-refining, aiming at encouraging high-fidelity prototypes. To enhance the deep representation for details, our self-refining works in different way: The foreground class prototype is refreshed by fusing the cluster-mined semantics, whilst background prototypes are enhanced by incorporating the channel-specific structural information.
  • Figure 2: Overview of the proposed DSPNet. The segmentation pipeline of DSPNet follows three steps sequentially: The feature extractor $f(\cdot)$ embeds the support image $I^{s}$ and query image $I^{q}$into deep features $F_{s}$ and $F_{q}$ respectively; after that, the prototypes are generated by the detail self-refining block $P_k={\rm{DSR}}(F_{s},F_{q},M_s)$; finally, the query image is segmented by measuring the cosine similarity between each prototype and query features at each pixel. In the block of ${\rm{DSR}}(\cdot,\cdot,\cdot)$, RAN calibrates $F_{s}$, $F_{q}$ to filter irrelevant object and noise, and then, high-fidelity class prototype and background prototypes are generated via FSPA and BCMA, respectively.
  • Figure 3: Architecture of Resemblance Attention Network.
  • Figure 4: FSPA illustration. (a) shows FSPA architecture where both cosine similarity computation Ⓒ and channel-wise prototype fusion Ⓓ are implemented in a one-dimensional convolution manner. For Ⓒ, the cluster prototypes $P_c$ serves as convolution filters individually. Regarding Ⓓ, the channel-wise convolution fillers are generated from $P_c$ by channel-dimensional slicing (see (b)), whilst the prototype fusion over probability maps $\phi(S_s)$ is demonstrated in (c).
  • Figure 5: Illustration of BCMA. (a) shows the proposed BCMA architecture where the controllable multi-head channel attention refreshes raw detail prototypes $P_n$ to $P_a$ by incorporating the channel-specific structural information. Taking $P_n^k$ as an example, (b) presents its detail self-refining to corresponding high-fidelity prototype $P_a^k$ whose $j$-th element $P_a^{k,j}$ is generated by attention head $h_j$. (c) elaborates $h_j$ where sparse channel-aware regulation block generate control factor ($\boldsymbol{r}$) to modulate global channel structural information of the $j$-th channel ($\boldsymbol{a}_j$) that is learnt by global exploration block.
  • ...and 4 more figures