LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring
Zhihao Chen, Chuang Niu, Qi Gao, Ge Wang, Hongming Shan
TL;DR
LIT-Former tackles 3D CT imaging under low-dose and low longitudinal resolution by jointly performing in-plane denoising and through-plane deblurring. It blends (2+1)D convolutions with efficient multi-head self-attention in a U-shaped architecture, splitting 3D processing into in-plane and through-plane components via eMSM and eCFN blocks. The method achieves state-of-the-art performance on simulated and clinical datasets, offering better PSNR/RMSE/SSIM, superior CT-number accuracy, and competitive efficiency compared with 3D baselines and other transformer-augmented models. This approach enables faster, lower-dose CT acquisitions with preserved diagnostic quality and provides a principled framework for future 3D medical image restoration that balances global and local information while reducing computational burden.
Abstract
This paper studies 3D low-dose computed tomography (CT) imaging. Although various deep learning methods were developed in this context, typically they focus on 2D images and perform denoising due to low-dose and deblurring for super-resolution separately. Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to obtain high-quality 3D CT images with lower radiation and faster imaging speed. For this task, a straightforward method is to directly train an end-to-end 3D network. However, it demands much more training data and expensive computational costs. Here, we propose to link in-plane and through-plane transformers for simultaneous in-plane denoising and through-plane deblurring, termed as LIT-Former, which can efficiently synergize in-plane and through-plane sub-tasks for 3D CT imaging and enjoy the advantages of both convolution and transformer networks. LIT-Former has two novel designs: efficient multi-head self-attention modules (eMSM) and efficient convolutional feedforward networks (eCFN). First, eMSM integrates in-plane 2D self-attention and through-plane 1D self-attention to efficiently capture global interactions of 3D self-attention, the core unit of transformer networks. Second, eCFN integrates 2D convolution and 1D convolution to extract local information of 3D convolution in the same fashion. As a result, the proposed LIT-Former synergize these two subtasks, significantly reducing the computational complexity as compared to 3D counterparts and enabling rapid convergence. Extensive experimental results on simulated and clinical datasets demonstrate superior performance over state-of-the-art models. The source code is made available at https://github.com/hao1635/LIT-Former.
