Table of Contents
Fetching ...

Tunable-Generalization Diffusion Powered by Self-Supervised Contextual Sub-Data for Low-Dose CT Reconstruction

Guoquan Wei, Liu Shi, Zekun Zhou, Wenzhe Shan, Qiegen Liu

TL;DR

TurnDiff tackles low-dose CT reconstruction without relying on paired or clean data by combining a projection-domain self-supervised contextual sub-data denoising stage with a latent-diffusion image refinement module. A pixel-level, self-correcting fusion then blends the projection-domain prior with the LDCT image, guided by edge and noise confidences, to enable tunable generalization across up and down doses and unseen doses. The method demonstrates robust improvements over state-of-the-art baselines on Mayo and LIDC-IDRI datasets and validates generalization on real clinical LDCT data, with ablations confirming the necessity of the dual-domain cascade, similarity enhancement, and per-pixel fusion. The approach offers a practical, clinically relevant self-supervised denoising solution that maintains lesion details while suppressing noise, potentially reducing radiation exposure without sacrificing diagnostic fidelity.

Abstract

Current models based on deep learning for low-dose CT denoising rely heavily on paired data and generalize poorly. Even the more concerned diffusion models need to learn the distribution of clean data for reconstruction, which is difficult to satisfy in medical clinical applications. At the same time, self-supervised-based methods face the challenge of significant degradation of generalizability of models pre-trained for the current dose to expand to other doses. To address these issues, this work proposes a novel method of TUnable-geneRalizatioN Diffusion (TurnDiff) powered by self-supervised contextual sub-data for low-dose CT reconstruction. Firstly, a contextual subdata self-enhancing similarity strategy is designed for denoising centered on the LDCT projection domain, which provides an initial prior for the subsequent progress. Subsequently, the initial prior is used to combine knowledge distillation with a deep combination of latent diffusion models for optimizing image details. The pre-trained model is used for inference reconstruction, and the pixel-level self-correcting fusion technique is proposed for fine-grained reconstruction of the image domain to enhance the image fidelity, using the initial prior and the LDCT image as a guide. In addition, the technique is flexibly applied to the generalization of upper and lower doses or even unseen doses. Dual-domain strategy cascade for self-supervised LDCT denoising, TurnDiff requires only LDCT projection domain data for training and testing. Comprehensive evaluation on both benchmark datasets and real-world data demonstrates that TurnDiff consistently outperforms state-of-the-art methods in both reconstruction and generalization.

Tunable-Generalization Diffusion Powered by Self-Supervised Contextual Sub-Data for Low-Dose CT Reconstruction

TL;DR

TurnDiff tackles low-dose CT reconstruction without relying on paired or clean data by combining a projection-domain self-supervised contextual sub-data denoising stage with a latent-diffusion image refinement module. A pixel-level, self-correcting fusion then blends the projection-domain prior with the LDCT image, guided by edge and noise confidences, to enable tunable generalization across up and down doses and unseen doses. The method demonstrates robust improvements over state-of-the-art baselines on Mayo and LIDC-IDRI datasets and validates generalization on real clinical LDCT data, with ablations confirming the necessity of the dual-domain cascade, similarity enhancement, and per-pixel fusion. The approach offers a practical, clinically relevant self-supervised denoising solution that maintains lesion details while suppressing noise, potentially reducing radiation exposure without sacrificing diagnostic fidelity.

Abstract

Current models based on deep learning for low-dose CT denoising rely heavily on paired data and generalize poorly. Even the more concerned diffusion models need to learn the distribution of clean data for reconstruction, which is difficult to satisfy in medical clinical applications. At the same time, self-supervised-based methods face the challenge of significant degradation of generalizability of models pre-trained for the current dose to expand to other doses. To address these issues, this work proposes a novel method of TUnable-geneRalizatioN Diffusion (TurnDiff) powered by self-supervised contextual sub-data for low-dose CT reconstruction. Firstly, a contextual subdata self-enhancing similarity strategy is designed for denoising centered on the LDCT projection domain, which provides an initial prior for the subsequent progress. Subsequently, the initial prior is used to combine knowledge distillation with a deep combination of latent diffusion models for optimizing image details. The pre-trained model is used for inference reconstruction, and the pixel-level self-correcting fusion technique is proposed for fine-grained reconstruction of the image domain to enhance the image fidelity, using the initial prior and the LDCT image as a guide. In addition, the technique is flexibly applied to the generalization of upper and lower doses or even unseen doses. Dual-domain strategy cascade for self-supervised LDCT denoising, TurnDiff requires only LDCT projection domain data for training and testing. Comprehensive evaluation on both benchmark datasets and real-world data demonstrates that TurnDiff consistently outperforms state-of-the-art methods in both reconstruction and generalization.

Paper Structure

This paper contains 28 sections, 21 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: Visual demonstration of the differences among (a) Noise2Noise, (b) Neighbor2Neighbor, (c) Prompt-SID and (d) TurnDiff.
  • Figure 2: General Architecture of TurnDiff Training and Inference. (a) The training of TurnDiff. (b) The inference of TurnDiff.
  • Figure 3: Description of random sampling of contextual projection data.
  • Figure 4: Tunable up- and down-dose generalization strategy.
  • Figure 5: {(a1), (a2)}: noisy images, {(b1), (b2)}: the edges extracted from $\hat{x}_{0}$, and {(c1), (c2)}: the noise level estimated from $x_{ld}$.
  • ...and 7 more figures