Hybrid Swin Attention Networks for Simultaneously Low-Dose PET and CT Denoising
Yichao Liu, Hengzhi Xue, YueYang Teng, Junwen Guo
TL;DR
HSANet addresses the challenge of denoising low-dose CT and PET images by fusing CNN residual blocks with Swin Transformer blocks in a hierarchical encoder–decoder. It introduces efficient ESGA and EPGA attention modules, a patch expanding (HIC) upsampling strategy, and a Sobel-enhanced loss to preserve edges while suppressing noise. Public dataset evaluations show HSANet achieves top-tier PSNR/SSIM with low RMSE and substantially fewer parameters than many baselines, highlighting its practicality for GPU-constrained clinical deployment. The work also demonstrates scalable improvements and provides a path toward noise-aware PET modeling and public code release.
Abstract
Low-dose computed tomography (LDCT) and positron emission tomography (PET) have emerged as safer alternatives to conventional imaging modalities by significantly reducing radiation exposure. However, this reduction often results in increased noise and artifacts, which can compromise diagnostic accuracy. Consequently, denoising for LDCT/PET has become a vital area of research aimed at enhancing image quality while maintaining radiation safety. In this study, we introduce a novel Hybrid Swin Attention Network (HSANet), which incorporates Efficient Global Attention (EGA) modules and a hybrid upsampling module. The EGA modules enhance both spatial and channel-wise interaction, improving the network's capacity to capture relevant features, while the hybrid upsampling module mitigates the risk of overfitting to noise. We validate the proposed approach using a publicly available LDCT/PET dataset. Experimental results demonstrate that HSANet achieves superior denoising performance compared to existing methods, while maintaining a lightweight model size suitable for deployment on GPUs with standard memory configurations. This makes our approach highly practical for real-world clinical applications.
