Table of Contents
Fetching ...

SFB-net for cardiac segmentation: Bridging the semantic gap with attention

Nicolas Portal, Nadjia Kachenoura, Thomas Dietenbeck, Catherine Achard

TL;DR

The Swin Filtering Block network (SFB-net) is introduced which takes advantage of both conventional and swin transformer layers to introduce spatial attention at the bottom of the network and generalizes well to data from different vendors and centres.

Abstract

In the past few years, deep learning algorithms have been widely used for cardiac image segmentation. However, most of these architectures rely on convolutions that hardly model long-range dependencies, limiting their ability to extract contextual information. In order to tackle this issue, this article introduces the Swin Filtering Block network (SFB-net) which takes advantage of both conventional and swin transformer layers. The former are used to introduce spatial attention at the bottom of the network, while the latter are applied to focus on high level semantically rich features between the encoder and decoder. An average Dice score of 92.4 was achieved on the ACDC dataset. To the best of our knowledge, this result outperforms any other work on this dataset. The average Dice score of 87.99 obtained on the M\&M's dataset demonstrates that the proposed method generalizes well to data from different vendors and centres.

SFB-net for cardiac segmentation: Bridging the semantic gap with attention

TL;DR

The Swin Filtering Block network (SFB-net) is introduced which takes advantage of both conventional and swin transformer layers to introduce spatial attention at the bottom of the network and generalizes well to data from different vendors and centres.

Abstract

In the past few years, deep learning algorithms have been widely used for cardiac image segmentation. However, most of these architectures rely on convolutions that hardly model long-range dependencies, limiting their ability to extract contextual information. In order to tackle this issue, this article introduces the Swin Filtering Block network (SFB-net) which takes advantage of both conventional and swin transformer layers. The former are used to introduce spatial attention at the bottom of the network, while the latter are applied to focus on high level semantically rich features between the encoder and decoder. An average Dice score of 92.4 was achieved on the ACDC dataset. To the best of our knowledge, this result outperforms any other work on this dataset. The average Dice score of 87.99 obtained on the M\&M's dataset demonstrates that the proposed method generalizes well to data from different vendors and centres.

Paper Structure

This paper contains 13 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Representation of SFB-net. Deep supervision is used in the decoder. Convolutional blocks are represented in blue, Swin Filtering Blocks (SFB) in yellow, and the conventional transformer layer in Purple.
  • Figure 2: Schematic representation of our Swin Filtering Blocks (SFB) used between the encoder and the decoder.