Table of Contents
Fetching ...

D-PerceptCT: Deep Perceptual Enhancement for Low-Dose CT Images

Taifour Yousra Nabila, Azeddine Beghdadi, Marie Luong, Zuheng Ming, Habib Zaidi, Faouzi Alaya Cheikh

TL;DR

D-PerceptCT addresses the challenge of preserving perceptually salient information in LDCT images while reducing noise. It integrates a Visual Dual-path Extractor (ViDex) that fuses semantic priors from DINOv2 with local detail processing, and a Deep Visual State Space (DV2SM) backbone with Global-Local State Space Blocks to capture multiscale, long-range features. A novel Deep Perceptual Relevancy Loss Function (DPRLF), informed by human contrast sensitivity, guides training toward perceptually important features. Evaluations on Mayo2016 show competitive traditional metrics and superior perceptual quality against state-of-the-art methods, highlighting the approach's potential for radiologist-facing LDCT enhancements and clinical impact.

Abstract

Low Dose Computed Tomography (LDCT) is widely used as an imaging solution to aid diagnosis and other clinical tasks. However, this comes at the price of a deterioration in image quality due to the low dose of radiation used to reduce the risk of secondary cancer development. While some efficient methods have been proposed to enhance LDCT quality, many overestimate noise and perform excessive smoothing, leading to a loss of critical details. In this paper, we introduce D-PerceptCT, a novel architecture inspired by key principles of the Human Visual System (HVS) to enhance LDCT images. The objective is to guide the model to enhance or preserve perceptually relevant features, thereby providing radiologists with CT images where critical anatomical structures and fine pathological details are perceptu- ally visible. D-PerceptCT consists of two main blocks: 1) a Visual Dual-path Extractor (ViDex), which integrates semantic priors from a pretrained DINOv2 model with local spatial features, allowing the network to incorporate semantic-awareness during enhancement; (2) a Global-Local State-Space block that captures long-range information and multiscale features to preserve the important structures and fine details for diagnosis. In addition, we propose a novel deep perceptual loss, designated as the Deep Perceptual Relevancy Loss Function (DPRLF), which is inspired by human contrast sensitivity, to further emphasize perceptually important features. Extensive experiments on the Mayo2016 dataset demonstrate the effectiveness of D-PerceptCT method for LDCT enhancement, showing better preservation of structural and textural information within LDCT images compared to SOTA methods.

D-PerceptCT: Deep Perceptual Enhancement for Low-Dose CT Images

TL;DR

D-PerceptCT addresses the challenge of preserving perceptually salient information in LDCT images while reducing noise. It integrates a Visual Dual-path Extractor (ViDex) that fuses semantic priors from DINOv2 with local detail processing, and a Deep Visual State Space (DV2SM) backbone with Global-Local State Space Blocks to capture multiscale, long-range features. A novel Deep Perceptual Relevancy Loss Function (DPRLF), informed by human contrast sensitivity, guides training toward perceptually important features. Evaluations on Mayo2016 show competitive traditional metrics and superior perceptual quality against state-of-the-art methods, highlighting the approach's potential for radiologist-facing LDCT enhancements and clinical impact.

Abstract

Low Dose Computed Tomography (LDCT) is widely used as an imaging solution to aid diagnosis and other clinical tasks. However, this comes at the price of a deterioration in image quality due to the low dose of radiation used to reduce the risk of secondary cancer development. While some efficient methods have been proposed to enhance LDCT quality, many overestimate noise and perform excessive smoothing, leading to a loss of critical details. In this paper, we introduce D-PerceptCT, a novel architecture inspired by key principles of the Human Visual System (HVS) to enhance LDCT images. The objective is to guide the model to enhance or preserve perceptually relevant features, thereby providing radiologists with CT images where critical anatomical structures and fine pathological details are perceptu- ally visible. D-PerceptCT consists of two main blocks: 1) a Visual Dual-path Extractor (ViDex), which integrates semantic priors from a pretrained DINOv2 model with local spatial features, allowing the network to incorporate semantic-awareness during enhancement; (2) a Global-Local State-Space block that captures long-range information and multiscale features to preserve the important structures and fine details for diagnosis. In addition, we propose a novel deep perceptual loss, designated as the Deep Perceptual Relevancy Loss Function (DPRLF), which is inspired by human contrast sensitivity, to further emphasize perceptually important features. Extensive experiments on the Mayo2016 dataset demonstrate the effectiveness of D-PerceptCT method for LDCT enhancement, showing better preservation of structural and textural information within LDCT images compared to SOTA methods.

Paper Structure

This paper contains 11 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Architectural Diagram of the proposed Visual Dual-path Extractor Module (ViDex).
  • Figure 2: Visualization of DINOv2 embeddings via t-SNE for paired low-dose (red) and high-dose (blue) CT slices of patient L506 from the Mayo2016 dataset.
  • Figure 3: Representative blocks of DV2SM. (a) Multiscale Vision Block, (b) Global Attention Block, (c) Global-Local State Space Block (GL2SB)
  • Figure 4: Reconstructed LDCT images with their corresponding regions of interest. (a) Input LDCT, (b) Denomamba ozturk2024denomamba, (c) our method, and (d) HDCT image (Ground Truth).
  • Figure 5: Qualitative results on slice 58 of patient L506, comparing LDCT enhancement methods. (a) LDCT input; (b) HDCT (ground truth); (c)--(g) outputs of REDCNN chen2017low, WGAN yang2018low, CTFormer wang2023ctformer, Denomamba ozturk2024denomamba, and D-PerceptCT (Ours), respectively. Zoomed ROIs are shown in each image. The display window is [-160, 240] HU, and absolute difference map intensities are scaled between 0 and 200 HU.