Table of Contents
Fetching ...

DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

Xuhui Liu, Zhi Qiao, Runkun Liu, Hong Li, Juan Zhang, Xiantong Zhen, Zhen Qian, Baochang Zhang

TL;DR

DiffuX2CT tackles reconstructing 3D CT volumes from ultra-sparse biplanar X-rays, an ill-posed inverse problem. It introduces a 3D conditional diffusion framework where the forward process $q(\mathbf{y}_t|\mathbf{y}_{t-1})=\mathcal{N}(\mathbf{y}_t|\sqrt{1-\beta_t}\mathbf{y}_{t-1},\beta_t\mathbf{I})$ is inverted by a trainable reverse denoiser $p_\theta(\mathbf{y}_{t-1}|\mathbf{y}_t,\mathbf{x}_1,\mathbf{x}_2)$. An implicit conditioning mechanism, via a tri-plane decoupling generator and an implicit neural decoder, provides 3D priors from 2D views to a 3D U-Net denoiser with shifted-window attention. Training uses a noise-prediction loss, augmented by a geometry projection loss $\mathcal{L}_{proj}$ with $\alpha_t=\prod_{i=1}^t(1-\beta_i)$ and $\hat{\mathbf{y}}_0=(\mathbf{y}_t-(1-\alpha_t)\hat{\varepsilon})/\sqrt{\alpha_t}$, yielding a total loss $\mathcal{L}_{total}=\mathcal{L}_{simple}+\lambda\mathcal{L}_{proj}$ with $\lambda=1/T$. Across LumbarV and three public datasets, DiffuX2CT achieves superior SSIM3D, FID, LPIPS, and segmentation Dice scores compared with regression and GAN baselines, demonstrating faithful 3D texture and geometry recovery from 2D X-rays. The work introduces LumbarV as a clinical benchmark and highlights potential utility in surgical planning where full CT is impractical.

Abstract

Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific research and medical applications. However, it presents great challenges as it is inherently an ill-posed problem, often compromised by artifacts resulting from overlapping structures in X-ray images. In this paper, we propose DiffuX2CT, which models CT reconstruction from orthogonal biplanar X-rays as a conditional diffusion process. DiffuX2CT is established with a 3D global coherence denoising model with a new, implicit conditioning mechanism. We realize the conditioning mechanism by a newly designed tri-plane decoupling generator and an implicit neural decoder. By doing so, DiffuX2CT achieves structure-controllable reconstruction, which enables 3D structural information to be recovered from 2D X-rays, therefore producing faithful textures in CT images. As an extra contribution, we collect a real-world lumbar CT dataset, called LumbarV, as a new benchmark to verify the clinical significance and performance of CT reconstruction from X-rays. Extensive experiments on this dataset and three more publicly available datasets demonstrate the effectiveness of our proposal.

DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

TL;DR

DiffuX2CT tackles reconstructing 3D CT volumes from ultra-sparse biplanar X-rays, an ill-posed inverse problem. It introduces a 3D conditional diffusion framework where the forward process is inverted by a trainable reverse denoiser . An implicit conditioning mechanism, via a tri-plane decoupling generator and an implicit neural decoder, provides 3D priors from 2D views to a 3D U-Net denoiser with shifted-window attention. Training uses a noise-prediction loss, augmented by a geometry projection loss with and , yielding a total loss with . Across LumbarV and three public datasets, DiffuX2CT achieves superior SSIM3D, FID, LPIPS, and segmentation Dice scores compared with regression and GAN baselines, demonstrating faithful 3D texture and geometry recovery from 2D X-rays. The work introduces LumbarV as a clinical benchmark and highlights potential utility in surgical planning where full CT is impractical.

Abstract

Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific research and medical applications. However, it presents great challenges as it is inherently an ill-posed problem, often compromised by artifacts resulting from overlapping structures in X-ray images. In this paper, we propose DiffuX2CT, which models CT reconstruction from orthogonal biplanar X-rays as a conditional diffusion process. DiffuX2CT is established with a 3D global coherence denoising model with a new, implicit conditioning mechanism. We realize the conditioning mechanism by a newly designed tri-plane decoupling generator and an implicit neural decoder. By doing so, DiffuX2CT achieves structure-controllable reconstruction, which enables 3D structural information to be recovered from 2D X-rays, therefore producing faithful textures in CT images. As an extra contribution, we collect a real-world lumbar CT dataset, called LumbarV, as a new benchmark to verify the clinical significance and performance of CT reconstruction from X-rays. Extensive experiments on this dataset and three more publicly available datasets demonstrate the effectiveness of our proposal.
Paper Structure (17 sections, 10 equations, 7 figures, 3 tables)

This paper contains 17 sections, 10 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Comparison of different reconstruction methods. The reconstructed CT images are rendered for a more intuitive illustration. Our DiffuX2CT significantly outperforms competitors including both generative and regression methods. DiffuX2CT generates high-quality and faithful CT images. Notably, it recovers precise bone structures and correctly positioned implants, which are close to ground truth.
  • Figure 2: Overview of our DiffuX2CT. Taking 2D biplanar orthogonal X-rays as inputs, the proposed implicit conditioning mechanism is implemented with a tri-plane decoupling generator and an implicit neural decoder to recover 3D structure information. By incorporating the 3D conditions, the 3D conditional denoising model iteratively generates CT images while preserving consistent structures provided in X-rays.
  • Figure 3: Qulitative comparisons on the CTSpine1K dataset. From top to bottom, we show the axial, sagittal, and coronal views. Best viewed by zoom in.
  • Figure 4: Qulitative comparisons on the LIDC-IDRI dataset. From top to bottom, we show the axial, sagittal, and coronal views. Best viewed by zoom in.
  • Figure 5: Rendering comparisons on the CTSpine1K datasets. The first and second row show the visualizations of soft tissues and bones, respectively.
  • ...and 2 more figures