Table of Contents
Fetching ...

Low-dose CT Denoising with Language-engaged Dual-space Alignment

Zhihao Chen, Tao Chen, Chenhui Wang, Chuang Niu, Ge Wang, Hongming Shan

TL;DR

Low-dose CT denoising often sacrifices texture and lacks interpretable guidance. LEDA introduces a plug-and-play loss that aligns denoised LDCT with NDCT in both continuous perceptual space and discrete semantic space by leveraging a frozen LLM-guided autoencoder built on VQGAN, and a pyramid semantic loss that injects anatomical semantics into token space. The LEDA loss combines a standard pixelwise loss with continuous feature and discrete token alignment, and experiments on Mayo-2016 and Mayo-2020 show improved SSIM and FSIM over strong baselines while preserving texture and reducing artifacts. Moreover, the discrete semantic tokens provide language-level explanations of anatomical content in the denoised images, enabling explainability along with performance gains.

Abstract

While various deep learning methods were proposed for low-dose computed tomography (CT) denoising, they often suffer from over-smoothing, blurring, and lack of explainability. To alleviate these issues, we propose a plug-and-play Language-Engaged Dual-space Alignment loss (LEDA) to optimize low-dose CT denoising models. Our idea is to leverage large language models (LLMs) to align denoised CT and normal dose CT images in both the continuous perceptual space and discrete semantic space, which is the first LLM-based scheme for low-dose CT denoising. LEDA involves two steps: the first is to pretrain an LLM-guided CT autoencoder, which can encode a CT image into continuous high-level features and quantize them into a token space to produce semantic tokens derived from the LLM's vocabulary; and the second is to minimize the discrepancy between the denoised CT images and normal dose CT in terms of both encoded high-level features and quantized token embeddings derived by the LLM-guided CT autoencoder. Extensive experimental results on two public LDCT denoising datasets demonstrate that our LEDA can enhance existing denoising models in terms of quantitative metrics and qualitative evaluation, and also provide explainability through language-level image understanding. Source code is available at https://github.com/hao1635/LEDA.

Low-dose CT Denoising with Language-engaged Dual-space Alignment

TL;DR

Low-dose CT denoising often sacrifices texture and lacks interpretable guidance. LEDA introduces a plug-and-play loss that aligns denoised LDCT with NDCT in both continuous perceptual space and discrete semantic space by leveraging a frozen LLM-guided autoencoder built on VQGAN, and a pyramid semantic loss that injects anatomical semantics into token space. The LEDA loss combines a standard pixelwise loss with continuous feature and discrete token alignment, and experiments on Mayo-2016 and Mayo-2020 show improved SSIM and FSIM over strong baselines while preserving texture and reducing artifacts. Moreover, the discrete semantic tokens provide language-level explanations of anatomical content in the denoised images, enabling explainability along with performance gains.

Abstract

While various deep learning methods were proposed for low-dose computed tomography (CT) denoising, they often suffer from over-smoothing, blurring, and lack of explainability. To alleviate these issues, we propose a plug-and-play Language-Engaged Dual-space Alignment loss (LEDA) to optimize low-dose CT denoising models. Our idea is to leverage large language models (LLMs) to align denoised CT and normal dose CT images in both the continuous perceptual space and discrete semantic space, which is the first LLM-based scheme for low-dose CT denoising. LEDA involves two steps: the first is to pretrain an LLM-guided CT autoencoder, which can encode a CT image into continuous high-level features and quantize them into a token space to produce semantic tokens derived from the LLM's vocabulary; and the second is to minimize the discrepancy between the denoised CT images and normal dose CT in terms of both encoded high-level features and quantized token embeddings derived by the LLM-guided CT autoencoder. Extensive experimental results on two public LDCT denoising datasets demonstrate that our LEDA can enhance existing denoising models in terms of quantitative metrics and qualitative evaluation, and also provide explainability through language-level image understanding. Source code is available at https://github.com/hao1635/LEDA.
Paper Structure (8 sections, 3 equations, 6 figures, 4 tables)

This paper contains 8 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of the proposed LEDA. (a) Step 1: Training of the LLM-guided CT autoencoder; and (b) Step 2: Employment of LEDA for training the denoising model.
  • Figure 2: Transverse CT images from the Mayo-2016 dataset. The ROI of the rectangle is zoomed below for better visualization. The display window is [-160, 240] HU.
  • Figure 3: Explainability provided by the text tokens extracted from the LLM, in which only part of quantized tokens are shown for the first 2 layers in the quantizer.
  • Figure 4: Transverse CT images of ablation studies on the different components in LEDA.
  • Figure S1: The 3-layer token pyramid in our LLM-guided CT autoencoder. We select position in using a downsample ratio of 4.
  • ...and 1 more figures