Table of Contents
Fetching ...

LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

Zhihao Chen, Chuang Niu, Qi Gao, Ge Wang, Hongming Shan

TL;DR

LIT-Former tackles 3D CT imaging under low-dose and low longitudinal resolution by jointly performing in-plane denoising and through-plane deblurring. It blends (2+1)D convolutions with efficient multi-head self-attention in a U-shaped architecture, splitting 3D processing into in-plane and through-plane components via eMSM and eCFN blocks. The method achieves state-of-the-art performance on simulated and clinical datasets, offering better PSNR/RMSE/SSIM, superior CT-number accuracy, and competitive efficiency compared with 3D baselines and other transformer-augmented models. This approach enables faster, lower-dose CT acquisitions with preserved diagnostic quality and provides a principled framework for future 3D medical image restoration that balances global and local information while reducing computational burden.

Abstract

This paper studies 3D low-dose computed tomography (CT) imaging. Although various deep learning methods were developed in this context, typically they focus on 2D images and perform denoising due to low-dose and deblurring for super-resolution separately. Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to obtain high-quality 3D CT images with lower radiation and faster imaging speed. For this task, a straightforward method is to directly train an end-to-end 3D network. However, it demands much more training data and expensive computational costs. Here, we propose to link in-plane and through-plane transformers for simultaneous in-plane denoising and through-plane deblurring, termed as LIT-Former, which can efficiently synergize in-plane and through-plane sub-tasks for 3D CT imaging and enjoy the advantages of both convolution and transformer networks. LIT-Former has two novel designs: efficient multi-head self-attention modules (eMSM) and efficient convolutional feedforward networks (eCFN). First, eMSM integrates in-plane 2D self-attention and through-plane 1D self-attention to efficiently capture global interactions of 3D self-attention, the core unit of transformer networks. Second, eCFN integrates 2D convolution and 1D convolution to extract local information of 3D convolution in the same fashion. As a result, the proposed LIT-Former synergize these two subtasks, significantly reducing the computational complexity as compared to 3D counterparts and enabling rapid convergence. Extensive experimental results on simulated and clinical datasets demonstrate superior performance over state-of-the-art models. The source code is made available at https://github.com/hao1635/LIT-Former.

LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

TL;DR

LIT-Former tackles 3D CT imaging under low-dose and low longitudinal resolution by jointly performing in-plane denoising and through-plane deblurring. It blends (2+1)D convolutions with efficient multi-head self-attention in a U-shaped architecture, splitting 3D processing into in-plane and through-plane components via eMSM and eCFN blocks. The method achieves state-of-the-art performance on simulated and clinical datasets, offering better PSNR/RMSE/SSIM, superior CT-number accuracy, and competitive efficiency compared with 3D baselines and other transformer-augmented models. This approach enables faster, lower-dose CT acquisitions with preserved diagnostic quality and provides a principled framework for future 3D medical image restoration that balances global and local information while reducing computational burden.

Abstract

This paper studies 3D low-dose computed tomography (CT) imaging. Although various deep learning methods were developed in this context, typically they focus on 2D images and perform denoising due to low-dose and deblurring for super-resolution separately. Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to obtain high-quality 3D CT images with lower radiation and faster imaging speed. For this task, a straightforward method is to directly train an end-to-end 3D network. However, it demands much more training data and expensive computational costs. Here, we propose to link in-plane and through-plane transformers for simultaneous in-plane denoising and through-plane deblurring, termed as LIT-Former, which can efficiently synergize in-plane and through-plane sub-tasks for 3D CT imaging and enjoy the advantages of both convolution and transformer networks. LIT-Former has two novel designs: efficient multi-head self-attention modules (eMSM) and efficient convolutional feedforward networks (eCFN). First, eMSM integrates in-plane 2D self-attention and through-plane 1D self-attention to efficiently capture global interactions of 3D self-attention, the core unit of transformer networks. Second, eCFN integrates 2D convolution and 1D convolution to extract local information of 3D convolution in the same fashion. As a result, the proposed LIT-Former synergize these two subtasks, significantly reducing the computational complexity as compared to 3D counterparts and enabling rapid convergence. Extensive experimental results on simulated and clinical datasets demonstrate superior performance over state-of-the-art models. The source code is made available at https://github.com/hao1635/LIT-Former.
Paper Structure (29 sections, 7 equations, 12 figures, 8 tables)

This paper contains 29 sections, 7 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Overview of the proposed network architecture: (a) LIT-former integrating in-plane and through-plane transformers, (b) the efficient multi-head self-attention module (eMSM), and (c) the efficient convolutional feed-forward network (eCFN). Dconv is short for depth-wise convolution.
  • Figure 2: Different types of convolutions in eCFN block. (a) Parallel and (b) Cascaded convolutions, respectively.
  • Figure 3: Transverse CT images and difference images from the simulated dataset: (a) NDRCT ; (b) Trilinear; (c) 3D-Unet cciccek20163d; (d) RED-CNN3D chen2017low2; (e) EDCNN3D liang2020edcnn; (f) IDD-net3D liu2022low; (g) TAM liu2021tam; (h) TAda huang2021tada; (i) BasicVSR++ chan2022basicvsr++; and (j) LIT-Former(ours). Zoomed ROI of the rectangle is shown below the full-size one. The display window is [-1350, 150] HU.
  • Figure 4: Transverse CT images and difference images from the clinical dataset: (a) NDRCT ; (b) Trilinear; (c) 3D-Unet cciccek20163d; (d) RED-CNN3D chen2017low2; (e) EDCNN3D liang2020edcnn; (f) IDD-net3D liu2022low; (g) TAM liu2021tam; (h) TAda huang2021tada; (i) BasicVSR++ chan2022basicvsr++; and (j) LIT-Former(ours). Zoomed ROI of the rectangle is shown below the full-size one. The display window is [-160, 240] HU
  • Figure 5: Coronal and sagittal CT images as well as difference images from the clinical dataset. The first two rows are coronal, and the next two rows are sagittal. (a) NDRCT; (b) Trilinear; (c) 3D-Unet cciccek20163d; (d) RED-CNN3D chen2017low2; (e) EDCNN3D liang2020edcnn; (f) IDD-net3D liu2022low; (g) TAM liu2021tam; (h) TAda huang2021tada; (i) BasicVSR++ chan2022basicvsr++; and (j) LIT-Former(ours). ROI is shown at the bottom left of full-size one. The display window is [-160, 240] HU.
  • ...and 7 more figures