Table of Contents
Fetching ...

DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays

Yiran Sun, Hana Baroudi, Tucker Netherton, Laurence Court, Osama Mawlawi, Ashok Veeraraghavan, Guha Balakrishnan

TL;DR

DIFR3CT introduces a first conditional latent diffusion framework for high-quality 3D CT reconstruction from extremely sparse planar X-rays. By fusing multi-view 2D features into a 3D conditioning volume and operating in a compact latent CT space via a 3D VQGAN, it delivers probabilistic reconstructions with MC-based uncertainty estimates and improved voxel-level fidelity over state-of-the-art sparse-view baselines. The approach demonstrates strong PSNR/SSIM performance on public and in-house datasets and shows preliminary feasibility for automated radiotherapy contouring and planning. This work offers a practical path toward accessible 3D imaging and RT planning in resource-constrained settings, with identified avenues for real-world validation and acquisition variability handling.

Abstract

Computed Tomography (CT) scans are the standard-of-care for the visualization and diagnosis of many clinical ailments, and are needed for the treatment planning of external beam radiotherapy. Unfortunately, the availability of CT scanners in low- and mid-resource settings is highly variable. Planar x-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D observations of the 3D anatomy. In this work we propose DIFR3CT, a 3D latent diffusion model, that can generate a distribution of plausible CT volumes from one or few (<10) planar x-ray observations. DIFR3CT works by fusing 2D features from each x-ray into a joint 3D space, and performing diffusion conditioned on these fused features in a low-dimensional latent space. We conduct extensive experiments demonstrating that DIFR3CT is better than recent sparse CT reconstruction baselines in terms of standard pixel-level (PSNR, SSIM) on both the public LIDC and in-house post-mastectomy CT datasets. We also show that DIFR3CT supports uncertainty quantification via Monte Carlo sampling, which provides an opportunity to measure reconstruction reliability. Finally, we perform a preliminary pilot study evaluating DIFR3CT for automated breast radiotherapy contouring and planning -- and demonstrate promising feasibility. Our code is available at https://github.com/yransun/DIFR3CT.

DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays

TL;DR

DIFR3CT introduces a first conditional latent diffusion framework for high-quality 3D CT reconstruction from extremely sparse planar X-rays. By fusing multi-view 2D features into a 3D conditioning volume and operating in a compact latent CT space via a 3D VQGAN, it delivers probabilistic reconstructions with MC-based uncertainty estimates and improved voxel-level fidelity over state-of-the-art sparse-view baselines. The approach demonstrates strong PSNR/SSIM performance on public and in-house datasets and shows preliminary feasibility for automated radiotherapy contouring and planning. This work offers a practical path toward accessible 3D imaging and RT planning in resource-constrained settings, with identified avenues for real-world validation and acquisition variability handling.

Abstract

Computed Tomography (CT) scans are the standard-of-care for the visualization and diagnosis of many clinical ailments, and are needed for the treatment planning of external beam radiotherapy. Unfortunately, the availability of CT scanners in low- and mid-resource settings is highly variable. Planar x-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D observations of the 3D anatomy. In this work we propose DIFR3CT, a 3D latent diffusion model, that can generate a distribution of plausible CT volumes from one or few (<10) planar x-ray observations. DIFR3CT works by fusing 2D features from each x-ray into a joint 3D space, and performing diffusion conditioned on these fused features in a low-dimensional latent space. We conduct extensive experiments demonstrating that DIFR3CT is better than recent sparse CT reconstruction baselines in terms of standard pixel-level (PSNR, SSIM) on both the public LIDC and in-house post-mastectomy CT datasets. We also show that DIFR3CT supports uncertainty quantification via Monte Carlo sampling, which provides an opportunity to measure reconstruction reliability. Finally, we perform a preliminary pilot study evaluating DIFR3CT for automated breast radiotherapy contouring and planning -- and demonstrate promising feasibility. Our code is available at https://github.com/yransun/DIFR3CT.
Paper Structure (20 sections, 14 equations, 9 figures, 2 tables)

This paper contains 20 sections, 14 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Extremely sparse-view CT reconstruction may be helpful in various low-resource settings for clinical applications like radiotherapy (RT) planning. In the ideal RT pipeline (top), a CT scan is taken of a patient, and fed to a RT planning system. The resulting RT plan, which assigns doses to different anatomical regions, is examined and potentially corrected by a clinician, before being applied to the patient. When CT scanners are unavailable (bottom), we are interested in reconstructing the CTs to sufficient detail from extremely sparse planar x-ray images. We propose DIFR3CT for such applications.
  • Figure 2: Overview of DIFR3CT. DIFR3CT consists of two parts. a. Feature fusion of multi-view X-rays: We extract a feature image $W_k$ from each input planar x-ray $X_k$ with a 2D U-Net. We then re-project $W_k$ back into 3D space using known x-ray imaging acquisition settings. We average all re-projected feature volumes into one feature volume $F_{avg}$. b. 3D conditional latent diffusion model: During training, each CT volume is encoded into a latent code $Z_0$ using a pretrained encoder ge2022long. We train a time-conditioned 3D denoising U-Net to take a random noisy latent code $Z_t$ and conditioning signal $F_{avg}$, and output a partially denoised code $Z_{t-1}$. After $T$ steps, the predicted code $\hat{Z}_0$ is reconstructed into a CT volume using a pretrained decoder.
  • Figure 3: Example x-rays generated by the TIGRE biguri2016tigre DRR generator for one LIDC CT volume. We generated these x-rays at eight angles (printed on the top-left corner of each x-ray in degrees) around the CT volume.
  • Figure 4: Comparison of DIFR3CT with baselines on the LIDC Dataset, given biplanar x-ray inputs. Each row corresponds to a different center planar view of the CT volume (axial, coronal, sagittal). The second to fifth column correspond to four baselines (marked in text on each image), and the final column shows the reconstructed 3D CT images by using the proposed DIFR3CT method. We also report PSNR/SSIM values on each slice. DIFR3CT generates the most realistic reconstructed details of all methods.
  • Figure 5: Example 3D LIDC CT reconstruction results on one patient, with varying numbers of input views. Each row corresponds to a different center planar view of the CT volume (axial, coronal, sagittal), and the second to sixth columns correspond to a different number of views (marked in text on each image). We also report PSNR/SSIM values on each slice. As the number of input viewing angles increases, the reconstruction details improve, especially near anatomical boundaries.
  • ...and 4 more figures