Table of Contents
Fetching ...

DX2CT: Diffusion Model for 3D CT Reconstruction from Bi or Mono-planar 2D X-ray(s)

Yun Su Jeong, Hye Bin Yoo, Il Yong Chun

TL;DR

DX2CT tackles reconstructing high-fidelity 3D CT volumes from limited 2D X-ray inputs by integrating a novel 3D Positional Query Transformer that modulates X-ray features with CT slice positions. This position-aware conditioning is injected into a conditional DDPM via SPADE, enabling end-to-end training that yields improved reconstructions over state-of-the-art methods on the LIDC dataset. Ablation shows the critical roles of 3DPQT and SPADE in performance gains, and real-world experiments demonstrate sharper, more structurally accurate CT volumes when using DX2CT. The approach promises safer imaging by reducing radiation exposure while maintaining diagnostic detail.

Abstract

Computational tomography (CT) provides high-resolution medical imaging, but it can expose patients to high radiation. X-ray scanners have low radiation exposure, but their resolutions are low. This paper proposes a new conditional diffusion model, DX2CT, that reconstructs three-dimensional (3D) CT volumes from bi or mono-planar X-ray image(s). Proposed DX2CT consists of two key components: 1) modulating feature maps extracted from two-dimensional (2D) X-ray(s) with 3D positions of CT volume using a new transformer and 2) effectively using the modulated 3D position-aware feature maps as conditions of DX2CT. In particular, the proposed transformer can provide conditions with rich information of a target CT slice to the conditional diffusion model, enabling high-quality CT reconstruction. Our experiments with the bi or mono-planar X-ray(s) benchmark datasets show that proposed DX2CT outperforms several state-of-the-art methods. Our codes and model will be available at: https://www.github.com/intyeger/DX2CT.

DX2CT: Diffusion Model for 3D CT Reconstruction from Bi or Mono-planar 2D X-ray(s)

TL;DR

DX2CT tackles reconstructing high-fidelity 3D CT volumes from limited 2D X-ray inputs by integrating a novel 3D Positional Query Transformer that modulates X-ray features with CT slice positions. This position-aware conditioning is injected into a conditional DDPM via SPADE, enabling end-to-end training that yields improved reconstructions over state-of-the-art methods on the LIDC dataset. Ablation shows the critical roles of 3DPQT and SPADE in performance gains, and real-world experiments demonstrate sharper, more structurally accurate CT volumes when using DX2CT. The approach promises safer imaging by reducing radiation exposure while maintaining diagnostic detail.

Abstract

Computational tomography (CT) provides high-resolution medical imaging, but it can expose patients to high radiation. X-ray scanners have low radiation exposure, but their resolutions are low. This paper proposes a new conditional diffusion model, DX2CT, that reconstructs three-dimensional (3D) CT volumes from bi or mono-planar X-ray image(s). Proposed DX2CT consists of two key components: 1) modulating feature maps extracted from two-dimensional (2D) X-ray(s) with 3D positions of CT volume using a new transformer and 2) effectively using the modulated 3D position-aware feature maps as conditions of DX2CT. In particular, the proposed transformer can provide conditions with rich information of a target CT slice to the conditional diffusion model, enabling high-quality CT reconstruction. Our experiments with the bi or mono-planar X-ray(s) benchmark datasets show that proposed DX2CT outperforms several state-of-the-art methods. Our codes and model will be available at: https://www.github.com/intyeger/DX2CT.
Paper Structure (14 sections, 6 equations, 4 figures, 2 tables)

This paper contains 14 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of proposed DX2CT in the reverse diffusion process. We extract multi-scale feature maps from biplanar X-rays with a feature extractor $\mathcal{E}$ and modulate the extracted feature maps with 3D positional information sets of target CT slice and X-rays that are generated from position-encoding networks $\mathcal{P}$ and $\mathcal{Q}$, respectively. We use the spatially-adaptive normalization (SPADE) method to incorporate modulated feature maps into denoising 2D U-Net 2dunet$\mathcal{D}$ of conditional diffusion model. We stack generated CT slices to reconstruct a 3D CT volume. We repeat the process for each anatomical plane.
  • Figure 2: The architecture of proposed 3DPQT (at the $l$th scale). We use X-ray features $\textbf{f}^{\text{PA}}_l$ and $\textbf{f}^{\text{Lat}}_l$ in (\ref{['eq:x-ray-fe']}) for key and value. We use $\textbf{p}^{m}_{n,l}$ in (\ref{['eq:pe-ct']}) for query. We pass query, key, and value through $B$ multi-head cross-attention blocks and generate 3D position-aware feature map $\mathbf{c}^m_{n,l}$ in (\ref{['eq:condition']}).
  • Figure 3: Comparisons of reconstructed 3D CTs with different methods (biplanar X-rays).
  • Figure 4: Comparisons of reconstructed 3D CTs with different methods (real-style biplanar X-rays).