Table of Contents
Fetching ...

XctDiff: Reconstruction of CT Images with Consistent Anatomical Structures from a Single Radiographic Projection Image

Qingze Bai, Tiange Liu, Zhi Liu, Yubing Tong, Drew Torigian, Jayaram Udupa

TL;DR

XctDiff tackles single-view CT reconstruction by learning robust 3D priors from 2D radiographs and guiding diffusion-based CT generation in a latent space. It introduces a 3D perceptual compression model, a progressive semantic encoder for 3D priors, and a prior-guided diffusion model with cross-attention, aided by a homogeneous spatial codebook. Trained on DRR-generated radiographs and adapted to real radiographs through style transfer, it achieves state-of-the-art PSNR and SSIM on the LIDC-IDRI dataset while preserving anatomical consistency and reducing blur. The approach also shows promise for self-supervised pretraining in medical image analysis and downstream tasks, highlighting practical impact for data-limited clinical contexts.

Abstract

In this paper, we present XctDiff, an algorithm framework for reconstructing CT from a single radiograph, which decomposes the reconstruction process into two easily controllable tasks: feature extraction and CT reconstruction. Specifically, we first design a progressive feature extraction strategy that is able to extract robust 3D priors from radiographs. Then, we use the extracted prior information to guide the CT reconstruction in the latent space. Moreover, we design a homogeneous spatial codebook to improve the reconstruction quality further. The experimental results show that our proposed method achieves state-of-the-art reconstruction performance and overcomes the blurring issue. We also apply XctDiff on self-supervised pre-training task. The effectiveness indicates that it has promising additional applications in medical image analysis. The code is available at:https://github.com/qingze-bai/XctDiff

XctDiff: Reconstruction of CT Images with Consistent Anatomical Structures from a Single Radiographic Projection Image

TL;DR

XctDiff tackles single-view CT reconstruction by learning robust 3D priors from 2D radiographs and guiding diffusion-based CT generation in a latent space. It introduces a 3D perceptual compression model, a progressive semantic encoder for 3D priors, and a prior-guided diffusion model with cross-attention, aided by a homogeneous spatial codebook. Trained on DRR-generated radiographs and adapted to real radiographs through style transfer, it achieves state-of-the-art PSNR and SSIM on the LIDC-IDRI dataset while preserving anatomical consistency and reducing blur. The approach also shows promise for self-supervised pretraining in medical image analysis and downstream tasks, highlighting practical impact for data-limited clinical contexts.

Abstract

In this paper, we present XctDiff, an algorithm framework for reconstructing CT from a single radiograph, which decomposes the reconstruction process into two easily controllable tasks: feature extraction and CT reconstruction. Specifically, we first design a progressive feature extraction strategy that is able to extract robust 3D priors from radiographs. Then, we use the extracted prior information to guide the CT reconstruction in the latent space. Moreover, we design a homogeneous spatial codebook to improve the reconstruction quality further. The experimental results show that our proposed method achieves state-of-the-art reconstruction performance and overcomes the blurring issue. We also apply XctDiff on self-supervised pre-training task. The effectiveness indicates that it has promising additional applications in medical image analysis. The code is available at:https://github.com/qingze-bai/XctDiff
Paper Structure (11 sections, 7 equations, 4 figures, 3 tables)

This paper contains 11 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The XctDiff utilizes a progressive semantic encoder to extract 3D anatomical priors from input radiograph image. The extracted features are then used to guide CT reconstruction in latent space. Finally, the reconstructed CT feature maps are used to generate high-quality CT images with consistent anatomical structures after passing through a vector quantization encoder. Note that the radiographs used in the training were converted using Digitally Reconstructed Radiography (DRR) technology. The radiographs in the inference stage are converted from real radiographs through style transfer model.
  • Figure 2: (a) The progressive encoder firstly approximates the rough shape of the human body only in the coronal plane, then learns more accurate 3D anatomical representations through multiple successive convolutional layers. (b) Different styles of radiographs. (Left) Real radiographs from the ChestXray2017. (Middle) Synthesized radiographs used for training and evaluate. (Right) Synthesized style real radiographs.
  • Figure 3: Qualitative visualization results on the LIDC-IDRI dataset. The transverse plane, sagittal plane, and coronal plane of the reconstruction results are shown.
  • Figure 4: Dice score gaps between pre-training with reconstructed CT, pre-training with real CT, and a scratch model on the BCV and MSD datasets.