SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation

Yifei Chen; Binfeng Zou; Zhaoxin Guo; Yiyu Huang; Yifan Huang; Feiwei Qin; Qinhai Li; Changmiao Wang

SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation

Yifei Chen, Binfeng Zou, Zhaoxin Guo, Yiyu Huang, Yifan Huang, Feiwei Qin, Qinhai Li, Changmiao Wang

TL;DR

PE segmentation in CTPA is challenged by noise and the need to capture both local and global features. SCUNet++ hybrids Swin-Transformer-based encoding with a CNN bottleneck and multi-fusion dense skip connections to preserve spatial detail while leveraging global context. It achieves state-of-the-art performance on FUMPE and CAD-PE, with $DSC$ around $0.834$ and $HD95$ in the range $3.8$–$5.1$, outperforming UNet, UNet++, Swin-UNet, and ResD-UNet. The approach promises practical impact by enabling more accurate automatic PE segmentation to assist clinical decision-making.

Abstract

Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can produce noises similar to PE, making confirmation of its presence time-consuming and prone to overdiagnosis. Nevertheless, the traditional segmentation method of PE can not fully consider the hierarchical structure of features, local and global spatial features of PE CT images. In this paper, we propose an automatic PE segmentation method called SCUNet++ (Swin Conv UNet++). This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. And fuses features of different scales in the decoder subnetwork to compensate for spatial information loss caused by the inevitable downsampling in Swin-UNet or other state-of-the-art methods, effectively solving the above problem. We provide a theoretical analysis of this method in detail and validate it on publicly available PE CT image datasets FUMPE and CAD-PE. The experimental results indicate that our proposed method achieved a Dice similarity coefficient (DSC) of 83.47% and a Hausdorff distance 95th percentile (HD95) of 3.83 on the FUMPE dataset, as well as a DSC of 83.42% and an HD95 of 5.10 on the CAD-PE dataset. These findings demonstrate that our method exhibits strong performance in PE segmentation tasks, potentially enhancing the accuracy of automatic segmentation of PE and providing a powerful diagnostic tool for clinical physicians. Our source code and new FUMPE dataset are available at https://github.com/JustlfC03/SCUNet-plusplus.

SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation

TL;DR

around

and

in the range

–

, outperforming UNet, UNet++, Swin-UNet, and ResD-UNet. The approach promises practical impact by enabling more accurate automatic PE segmentation to assist clinical decision-making.

Abstract

Paper Structure (17 sections, 2 equations, 7 figures, 5 tables)

This paper contains 17 sections, 2 equations, 7 figures, 5 tables.

Introduction
Related Work
Method
Overview of the Architecture
Double Swin-Transformer Block
Encoder
Patch Merging Layer
Bottleneck (CNN Block)
Decoder
Patch Expanding Layer
Multi-Fusion Dense Skip Connection
Experiment
Dataset
Implementation Details
Comparison with Typical Segmentation Models
...and 2 more sections

Figures (7)

Figure 1: The test results are presented in the following order: from left to right, the input images, the output segmentations, and the ground truth.
Figure 2: Overall structure of the network model. This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. Additionally, we use CNN in bottleneck and Multi-Fusion Dense Skip Connections to make up for the Transformer's shortcomings in local spatial feature extraction.
Figure 3: Swin-Transformer module. MSA denotes the multiheaded attention module and MLP represents the multilayer perceptron module.
Figure 4: Multi-Fusion Dense Skip Connection module.
Figure 5: Original PE dataset. Figures (a-c) present examples from the CAD-PE dataset, while figures (d-f) showcase examples from the FUMPE dataset.
...and 2 more figures

SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation

TL;DR

Abstract

SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)