Table of Contents
Fetching ...

DVG-Diffusion: Dual-View Guided Diffusion Model for CT Reconstruction from X-Rays

Xing Xie, Jiawei Liu, Huijie Fan, Zhi Han, Yandong Tang, Liangqiong Qu

TL;DR

DVG-Diffusion tackles few-view CT reconstruction by introducing a dual-view diffusion framework that jointly leverages a real input X-ray and a synthesized additional view. A view-parameter guided encoder (VPGE) back-projects X-rays into a CT-aligned latent space, enabling 3D-3D latent-domain learning for CT reconstruction. The method synthesizes a new X-ray view and uses dual-view latent features as diffusion conditions to refine the CT latent representation, which is decoded to a CT volume; results show state-of-the-art performance with a favorable fidelity-perceptual trade-off. Extensive ablations and analyses reveal the roles of VPGE and new-view guidance, and the approach demonstrates robustness across view counts and shapes of angular distributions, with clinical data showing improved boundary and pathology delineation.

Abstract

Directly reconstructing 3D CT volume from few-view 2D X-rays using an end-to-end deep learning network is a challenging task, as X-ray images are merely projection views of the 3D CT volume. In this work, we facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment. Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction. First, a novel view parameter-guided encoder captures features from X-rays that are spatially aligned with CT. Next, we concatenate the extracted dual-view features as conditions for the latent diffusion model to learn and refine the CT latent representation. Finally, the CT latent representation is decoded into a CT volume in pixel space. By incorporating view parameter guided encoding and dual-view guided CT reconstruction, our DVG-Diffusion can achieve an effective balance between high fidelity and perceptual quality for CT reconstruction. Experimental results demonstrate our method outperforms state-of-the-art methods. Based on experiments, the comprehensive analysis and discussions for views and reconstruction are also presented.

DVG-Diffusion: Dual-View Guided Diffusion Model for CT Reconstruction from X-Rays

TL;DR

DVG-Diffusion tackles few-view CT reconstruction by introducing a dual-view diffusion framework that jointly leverages a real input X-ray and a synthesized additional view. A view-parameter guided encoder (VPGE) back-projects X-rays into a CT-aligned latent space, enabling 3D-3D latent-domain learning for CT reconstruction. The method synthesizes a new X-ray view and uses dual-view latent features as diffusion conditions to refine the CT latent representation, which is decoded to a CT volume; results show state-of-the-art performance with a favorable fidelity-perceptual trade-off. Extensive ablations and analyses reveal the roles of VPGE and new-view guidance, and the approach demonstrates robustness across view counts and shapes of angular distributions, with clinical data showing improved boundary and pathology delineation.

Abstract

Directly reconstructing 3D CT volume from few-view 2D X-rays using an end-to-end deep learning network is a challenging task, as X-ray images are merely projection views of the 3D CT volume. In this work, we facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment. Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction. First, a novel view parameter-guided encoder captures features from X-rays that are spatially aligned with CT. Next, we concatenate the extracted dual-view features as conditions for the latent diffusion model to learn and refine the CT latent representation. Finally, the CT latent representation is decoded into a CT volume in pixel space. By incorporating view parameter guided encoding and dual-view guided CT reconstruction, our DVG-Diffusion can achieve an effective balance between high fidelity and perceptual quality for CT reconstruction. Experimental results demonstrate our method outperforms state-of-the-art methods. Based on experiments, the comprehensive analysis and discussions for views and reconstruction are also presented.

Paper Structure

This paper contains 20 sections, 9 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Framework of the proposed DVG-Diffusion. (a) We utilize dual-view (i.e., the real input view $I_{in}$ and the synthesized new view $I_{new}$) to jointly guide the reconstruction of the finer CT volume ($V_{ct}$). $I_{in}$ and $I_{new}$ are used as conditions to guide diffusion models to restore more detail. (b) We synthesize the new view ($I_{new}$) by predicting the initial CT volume from the input view ($I_{in}\to V_{ct}^{init}$) and then forward projecting the predicted CT to the new view X-ray ($V_{ct}^{init}\to I_{new}$). A core design is our view parameter guided encoder (VPGE). To more easily align the feature space and learn the mapping from 2D images to 3D CT volume, in our VPGE, the input X-rays are first back projected into 3D space and then encoded into latent space ($I_{in} \to V_{bp} \to z_{in}$).
  • Figure 2: The detailed process of back projection and VPGE using biplanar inputs as an example. The backprojector (BP) unfolds the points on each 2D projection into 3D space separately and superimposes them to obtain a geometrically consistent 3D image $V_{bp}$. The robust discrete feature encoder $\mathcal{E}_{bp}$ is then obtained by self-supervised training of VQ-GAN.
  • Figure 3: Example of reconstructed CT slices from biplanar X-rays using the proposed DVG-Diffusion and competing methods, along with the prediction error maps. Two sets of samples are slices along the axial plane (AXI) and coronal plane (COR). Red arrows in the figure indicate instances of contour distortion.
  • Figure 4: Visual results of baseline and the proposed DVG-Diffusion in ablation studies, using single-view and biplanar X-rays as inputs.
  • Figure 5: Performance curves of CT reconstruction using DVG-Diffusion (without new view synthesis) for different numbers of input views and different angle sampling ranges, and a comparison with ReconNet.
  • ...and 4 more figures