Table of Contents
Fetching ...

Hybrid Swin Attention Networks for Simultaneously Low-Dose PET and CT Denoising

Yichao Liu, Hengzhi Xue, YueYang Teng, Junwen Guo

TL;DR

HSANet addresses the challenge of denoising low-dose CT and PET images by fusing CNN residual blocks with Swin Transformer blocks in a hierarchical encoder–decoder. It introduces efficient ESGA and EPGA attention modules, a patch expanding (HIC) upsampling strategy, and a Sobel-enhanced loss to preserve edges while suppressing noise. Public dataset evaluations show HSANet achieves top-tier PSNR/SSIM with low RMSE and substantially fewer parameters than many baselines, highlighting its practicality for GPU-constrained clinical deployment. The work also demonstrates scalable improvements and provides a path toward noise-aware PET modeling and public code release.

Abstract

Low-dose computed tomography (LDCT) and positron emission tomography (PET) have emerged as safer alternatives to conventional imaging modalities by significantly reducing radiation exposure. However, this reduction often results in increased noise and artifacts, which can compromise diagnostic accuracy. Consequently, denoising for LDCT/PET has become a vital area of research aimed at enhancing image quality while maintaining radiation safety. In this study, we introduce a novel Hybrid Swin Attention Network (HSANet), which incorporates Efficient Global Attention (EGA) modules and a hybrid upsampling module. The EGA modules enhance both spatial and channel-wise interaction, improving the network's capacity to capture relevant features, while the hybrid upsampling module mitigates the risk of overfitting to noise. We validate the proposed approach using a publicly available LDCT/PET dataset. Experimental results demonstrate that HSANet achieves superior denoising performance compared to existing methods, while maintaining a lightweight model size suitable for deployment on GPUs with standard memory configurations. This makes our approach highly practical for real-world clinical applications.

Hybrid Swin Attention Networks for Simultaneously Low-Dose PET and CT Denoising

TL;DR

HSANet addresses the challenge of denoising low-dose CT and PET images by fusing CNN residual blocks with Swin Transformer blocks in a hierarchical encoder–decoder. It introduces efficient ESGA and EPGA attention modules, a patch expanding (HIC) upsampling strategy, and a Sobel-enhanced loss to preserve edges while suppressing noise. Public dataset evaluations show HSANet achieves top-tier PSNR/SSIM with low RMSE and substantially fewer parameters than many baselines, highlighting its practicality for GPU-constrained clinical deployment. The work also demonstrates scalable improvements and provides a path toward noise-aware PET modeling and public code release.

Abstract

Low-dose computed tomography (LDCT) and positron emission tomography (PET) have emerged as safer alternatives to conventional imaging modalities by significantly reducing radiation exposure. However, this reduction often results in increased noise and artifacts, which can compromise diagnostic accuracy. Consequently, denoising for LDCT/PET has become a vital area of research aimed at enhancing image quality while maintaining radiation safety. In this study, we introduce a novel Hybrid Swin Attention Network (HSANet), which incorporates Efficient Global Attention (EGA) modules and a hybrid upsampling module. The EGA modules enhance both spatial and channel-wise interaction, improving the network's capacity to capture relevant features, while the hybrid upsampling module mitigates the risk of overfitting to noise. We validate the proposed approach using a publicly available LDCT/PET dataset. Experimental results demonstrate that HSANet achieves superior denoising performance compared to existing methods, while maintaining a lightweight model size suitable for deployment on GPUs with standard memory configurations. This makes our approach highly practical for real-world clinical applications.

Paper Structure

This paper contains 18 sections, 5 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: The structure of HSANet. It consists of two residual convolutional blocks used in the encoder and two in the decoder, along with a central residual encoder-decoder block. The residual convolutional blocks incorporate the proposed ESGA module within their residual structure. The encoder-decoder block is designed to learn hierarchical representations from LDCT images. It includes Swin Transformer blocks for both encoding and decoding, a patch merging module in the encoder, a patch expanding module in the decoder, skip connections, and an embedded ESGA module for enhanced information fusion.
  • Figure 2: The structure of ESGA module. The module consists of channel attention and spatial attention sequentially. The Gelu activation function is shown in a dashed line. ESGA module with Gelu activation function is used in Swin Transformer block to replace MLP, otherwise, it is used in residual convolution blocks.
  • Figure 3: HIC patch expanding module. LN represents layer normalization. We adopt nearest interpolation. White boxes represent zero. We expand feature size by interleaving zeros between columns and rows.
  • Figure 4: Quantitative (a)PSNR, (b)SSIM and (c)RMSE of different models on 8 different runs. Red points are average. Width of violin plot represent the density of data at each value. Quartiles are shown as thick lines inside the violin plot
  • Figure 5: Results of pelvis image for comparison. (a)LDCT, (b)RED-CNN,(c)Swin-Unet, (d)SwinIR, (e)CTformer, (f)Unet, (g)HSANet, (h)FDCT
  • ...and 4 more figures