Table of Contents
Fetching ...

Differentiable Voxel-based X-ray Rendering Improves Sparse-View 3D CBCT Reconstruction

Mohammadhossein Momeni, Vivek Gopalakrishnan, Neel Dey, Polina Golland, Sarah Frisken

TL;DR

DiffVox addresses sparse-view CBCT reconstruction by directly optimizing a discrete voxelgrid $\hat{\bm \mu}$ with a physics-based differentiable X-ray renderer. It systematically compares two forward models, Siddon's exact line integrals and a fast trilinear interpolation, within a unified optimization that includes TV regularization and nonnegativity constraints. On real X-ray data from 42 walnuts, DiffVox achieves state-of-the-art reconstruction and novel-view fidelity in the sparse-view regime, with Siddon's method offering the best accuracy at a modest runtime cost and substantially faster training than prior neural-field approaches. This work enables high-quality CBCT with reduced radiation exposure and motivates further integration of realistic forward models and geometry optimization in differentiable frameworks.

Abstract

We present DiffVox, a self-supervised framework for Cone-Beam Computed Tomography (CBCT) reconstruction by directly optimizing a voxelgrid representation using physics-based differentiable X-ray rendering. Further, we investigate how the different implementations of the X-ray image formation model in the renderer affect the quality of 3D reconstruction and novel view synthesis. When combined with our regularized voxel-based learning framework, we find that using an exact implementation of the discrete Beer-Lambert law for X-ray attenuation in the renderer outperforms both widely used iterative CBCT reconstruction algorithms and modern neural field approaches, particularly when given only a few input views. As a result, we reconstruct high-fidelity 3D CBCT volumes from fewer X-rays, potentially reducing ionizing radiation exposure and improving diagnostic utility. Our implementation is available at https://github.com/hossein-momeni/DiffVox.

Differentiable Voxel-based X-ray Rendering Improves Sparse-View 3D CBCT Reconstruction

TL;DR

DiffVox addresses sparse-view CBCT reconstruction by directly optimizing a discrete voxelgrid with a physics-based differentiable X-ray renderer. It systematically compares two forward models, Siddon's exact line integrals and a fast trilinear interpolation, within a unified optimization that includes TV regularization and nonnegativity constraints. On real X-ray data from 42 walnuts, DiffVox achieves state-of-the-art reconstruction and novel-view fidelity in the sparse-view regime, with Siddon's method offering the best accuracy at a modest runtime cost and substantially faster training than prior neural-field approaches. This work enables high-quality CBCT with reduced radiation exposure and motivates further integration of realistic forward models and geometry optimization in differentiable frameworks.

Abstract

We present DiffVox, a self-supervised framework for Cone-Beam Computed Tomography (CBCT) reconstruction by directly optimizing a voxelgrid representation using physics-based differentiable X-ray rendering. Further, we investigate how the different implementations of the X-ray image formation model in the renderer affect the quality of 3D reconstruction and novel view synthesis. When combined with our regularized voxel-based learning framework, we find that using an exact implementation of the discrete Beer-Lambert law for X-ray attenuation in the renderer outperforms both widely used iterative CBCT reconstruction algorithms and modern neural field approaches, particularly when given only a few input views. As a result, we reconstruct high-fidelity 3D CBCT volumes from fewer X-rays, potentially reducing ionizing radiation exposure and improving diagnostic utility. Our implementation is available at https://github.com/hossein-momeni/DiffVox.

Paper Structure

This paper contains 4 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: (A) In sparse-view CBCT reconstruction, a small number of X-ray images are acquired in a circular orbit about a subject. We compare two implementations of the X-ray image formation model for reconstruction via differentiable rendering: (B) Siddon's method and (C) trilinear interpolation.
  • Figure 2: (A) CBCT reconstructions of an exemplar walnut from the test set using 15 input views. The Structural Similarity Index Measure (SSIM) for each 3D reconstruction is annotated. Blue insets highlight where our methods outperform the baselines, with sharper boundaries and fewer artifacts. Red insets indicate areas where all methods struggle, particularly in reconstructing thin structures. (B) Novel views rendered from these estimated volumes are compared to a ground truth X-ray image not seen during reconstruction and similarly annotated with SSIM. Novel views were rendered using the forward model implemented in each method.
  • Figure 3: Quality of (A) reconstructed 3D volumes and (B) novel 2D views rendered from these volumes, and (C) reconstruction runtimes over a range of input views. DiffVox, using either Siddon's method or trilinear interpolation as the forward model, achieves the highest quality reconstructions and renderings, with particularly appreciable gains in the sparsest-view settings. Error bars are plotted using the standard error (se).