LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

Zhihao Chen; Chuang Niu; Qi Gao; Ge Wang; Hongming Shan

LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

Zhihao Chen, Chuang Niu, Qi Gao, Ge Wang, Hongming Shan

TL;DR

LIT-Former tackles 3D CT imaging under low-dose and low longitudinal resolution by jointly performing in-plane denoising and through-plane deblurring. It blends (2+1)D convolutions with efficient multi-head self-attention in a U-shaped architecture, splitting 3D processing into in-plane and through-plane components via eMSM and eCFN blocks. The method achieves state-of-the-art performance on simulated and clinical datasets, offering better PSNR/RMSE/SSIM, superior CT-number accuracy, and competitive efficiency compared with 3D baselines and other transformer-augmented models. This approach enables faster, lower-dose CT acquisitions with preserved diagnostic quality and provides a principled framework for future 3D medical image restoration that balances global and local information while reducing computational burden.

Abstract

This paper studies 3D low-dose computed tomography (CT) imaging. Although various deep learning methods were developed in this context, typically they focus on 2D images and perform denoising due to low-dose and deblurring for super-resolution separately. Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to obtain high-quality 3D CT images with lower radiation and faster imaging speed. For this task, a straightforward method is to directly train an end-to-end 3D network. However, it demands much more training data and expensive computational costs. Here, we propose to link in-plane and through-plane transformers for simultaneous in-plane denoising and through-plane deblurring, termed as LIT-Former, which can efficiently synergize in-plane and through-plane sub-tasks for 3D CT imaging and enjoy the advantages of both convolution and transformer networks. LIT-Former has two novel designs: efficient multi-head self-attention modules (eMSM) and efficient convolutional feedforward networks (eCFN). First, eMSM integrates in-plane 2D self-attention and through-plane 1D self-attention to efficiently capture global interactions of 3D self-attention, the core unit of transformer networks. Second, eCFN integrates 2D convolution and 1D convolution to extract local information of 3D convolution in the same fashion. As a result, the proposed LIT-Former synergize these two subtasks, significantly reducing the computational complexity as compared to 3D counterparts and enabling rapid convergence. Extensive experimental results on simulated and clinical datasets demonstrate superior performance over state-of-the-art models. The source code is made available at https://github.com/hao1635/LIT-Former.

LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

TL;DR

Abstract

Paper Structure (29 sections, 7 equations, 12 figures, 8 tables)

This paper contains 29 sections, 7 equations, 12 figures, 8 tables.

Introduction
Methods
Overall Framework of LIT-Former
Efficient Multi-Head Self-Attention Modules
In-plane branch of eMSM (eMSM-I)
Through-plane branch of eMSM (eMSM-T)
Efficient Convolutional Feed-Forward Networks
Loss Function
Experiments
Datasets
Simulated dataset
Clinical dataset
Implementation Details
Compared Methods
Quantitative Evaluations
...and 14 more sections

Figures (12)

Figure 1: Overview of the proposed network architecture: (a) LIT-former integrating in-plane and through-plane transformers, (b) the efficient multi-head self-attention module (eMSM), and (c) the efficient convolutional feed-forward network (eCFN). Dconv is short for depth-wise convolution.
Figure 2: Different types of convolutions in eCFN block. (a) Parallel and (b) Cascaded convolutions, respectively.
Figure 3: Transverse CT images and difference images from the simulated dataset: (a) NDRCT ; (b) Trilinear; (c) 3D-Unet cciccek20163d; (d) RED-CNN3D chen2017low2; (e) EDCNN3D liang2020edcnn; (f) IDD-net3D liu2022low; (g) TAM liu2021tam; (h) TAda huang2021tada; (i) BasicVSR++ chan2022basicvsr++; and (j) LIT-Former(ours). Zoomed ROI of the rectangle is shown below the full-size one. The display window is [-1350, 150] HU.
Figure 4: Transverse CT images and difference images from the clinical dataset: (a) NDRCT ; (b) Trilinear; (c) 3D-Unet cciccek20163d; (d) RED-CNN3D chen2017low2; (e) EDCNN3D liang2020edcnn; (f) IDD-net3D liu2022low; (g) TAM liu2021tam; (h) TAda huang2021tada; (i) BasicVSR++ chan2022basicvsr++; and (j) LIT-Former(ours). Zoomed ROI of the rectangle is shown below the full-size one. The display window is [-160, 240] HU
Figure 5: Coronal and sagittal CT images as well as difference images from the clinical dataset. The first two rows are coronal, and the next two rows are sagittal. (a) NDRCT; (b) Trilinear; (c) 3D-Unet cciccek20163d; (d) RED-CNN3D chen2017low2; (e) EDCNN3D liang2020edcnn; (f) IDD-net3D liu2022low; (g) TAM liu2021tam; (h) TAda huang2021tada; (i) BasicVSR++ chan2022basicvsr++; and (j) LIT-Former(ours). ROI is shown at the bottom left of full-size one. The display window is [-160, 240] HU.
...and 7 more figures

LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

TL;DR

Abstract

LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

Authors

TL;DR

Abstract

Table of Contents

Figures (12)