Differentiable Voxel-based X-ray Rendering Improves Sparse-View 3D CBCT Reconstruction
Mohammadhossein Momeni, Vivek Gopalakrishnan, Neel Dey, Polina Golland, Sarah Frisken
TL;DR
DiffVox addresses sparse-view CBCT reconstruction by directly optimizing a discrete voxelgrid $\hat{\bm \mu}$ with a physics-based differentiable X-ray renderer. It systematically compares two forward models, Siddon's exact line integrals and a fast trilinear interpolation, within a unified optimization that includes TV regularization and nonnegativity constraints. On real X-ray data from 42 walnuts, DiffVox achieves state-of-the-art reconstruction and novel-view fidelity in the sparse-view regime, with Siddon's method offering the best accuracy at a modest runtime cost and substantially faster training than prior neural-field approaches. This work enables high-quality CBCT with reduced radiation exposure and motivates further integration of realistic forward models and geometry optimization in differentiable frameworks.
Abstract
We present DiffVox, a self-supervised framework for Cone-Beam Computed Tomography (CBCT) reconstruction by directly optimizing a voxelgrid representation using physics-based differentiable X-ray rendering. Further, we investigate how the different implementations of the X-ray image formation model in the renderer affect the quality of 3D reconstruction and novel view synthesis. When combined with our regularized voxel-based learning framework, we find that using an exact implementation of the discrete Beer-Lambert law for X-ray attenuation in the renderer outperforms both widely used iterative CBCT reconstruction algorithms and modern neural field approaches, particularly when given only a few input views. As a result, we reconstruct high-fidelity 3D CBCT volumes from fewer X-rays, potentially reducing ionizing radiation exposure and improving diagnostic utility. Our implementation is available at https://github.com/hossein-momeni/DiffVox.
