Table of Contents
Fetching ...

MInDI-3D: Iterative Deep Learning in 3D for Sparse-view Cone Beam Computed Tomography

Daniel Barco, Marc Stadelmann, Martin Oswald, Ivo Herzig, Lukas Lichtensteiger, Pascal Paysan, Igor Peterlik, Michal Walczak, Bjoern Menze, Frank-Peter Schilling

TL;DR

MInDI-3D extends InDI to fully 3D, enabling iterative diffusion-based artefact removal for sparse-view CBCT to reduce radiation exposure. By training on a large pseudo-CBCT dataset derived from CT-RATE and validating on a real-world HyperSight set, it demonstrates substantial PSNR/SSIM gains over uncorrected scans (e.g., +12.96 dB PSNR in some settings) and competitive performance relative to 3D U-Net baselines, with strong generalization across anatomies and projection levels. The method provides a controllable perception-distortion trade-off via iterative steps and receives positive clinician feedback for patient positioning tasks, while highlighting domain-shift challenges for dose calculation and contouring. Overall, MInDI-3D shows promise as a clinically viable tool for high-fidelity 3D CBCT reconstruction with radiation-dose reductions, supported by scalable data, extensive quantitative and clinical evaluations, and a publicly released pseudo-CBCT dataset to foster future research.

Abstract

We present MInDI-3D (Medical Inversion by Direct Iteration in 3D), the first 3D conditional diffusion-based model for real-world sparse-view Cone Beam Computed Tomography (CBCT) artefact removal, aiming to reduce imaging radiation exposure. A key contribution is extending the "InDI" concept from 2D to a full 3D volumetric approach for medical images, implementing an iterative denoising process that refines the CBCT volume directly from sparse-view input. A further contribution is the generation of a large pseudo-CBCT dataset (16,182) from chest CT volumes of the CT-RATE public dataset to robustly train MInDI-3D. We performed a comprehensive evaluation, including quantitative metrics, scalability analysis, generalisation tests, and a clinical assessment by 11 clinicians. Our results show MInDI-3D's effectiveness, achieving a 12.96 (6.10) dB PSNR gain over uncorrected scans with only 50 projections on the CT-RATE pseudo-CBCT (independent real-world) test set and enabling an 8x reduction in imaging radiation exposure. We demonstrate its scalability by showing that performance improves with more training data. Importantly, MInDI-3D matches the performance of a 3D U-Net on real-world scans from 16 cancer patients across distortion and task-based metrics. It also generalises to new CBCT scanner geometries. Clinicians rated our model as sufficient for patient positioning across all anatomical sites and found it preserved lung tumour boundaries well.

MInDI-3D: Iterative Deep Learning in 3D for Sparse-view Cone Beam Computed Tomography

TL;DR

MInDI-3D extends InDI to fully 3D, enabling iterative diffusion-based artefact removal for sparse-view CBCT to reduce radiation exposure. By training on a large pseudo-CBCT dataset derived from CT-RATE and validating on a real-world HyperSight set, it demonstrates substantial PSNR/SSIM gains over uncorrected scans (e.g., +12.96 dB PSNR in some settings) and competitive performance relative to 3D U-Net baselines, with strong generalization across anatomies and projection levels. The method provides a controllable perception-distortion trade-off via iterative steps and receives positive clinician feedback for patient positioning tasks, while highlighting domain-shift challenges for dose calculation and contouring. Overall, MInDI-3D shows promise as a clinically viable tool for high-fidelity 3D CBCT reconstruction with radiation-dose reductions, supported by scalable data, extensive quantitative and clinical evaluations, and a publicly released pseudo-CBCT dataset to foster future research.

Abstract

We present MInDI-3D (Medical Inversion by Direct Iteration in 3D), the first 3D conditional diffusion-based model for real-world sparse-view Cone Beam Computed Tomography (CBCT) artefact removal, aiming to reduce imaging radiation exposure. A key contribution is extending the "InDI" concept from 2D to a full 3D volumetric approach for medical images, implementing an iterative denoising process that refines the CBCT volume directly from sparse-view input. A further contribution is the generation of a large pseudo-CBCT dataset (16,182) from chest CT volumes of the CT-RATE public dataset to robustly train MInDI-3D. We performed a comprehensive evaluation, including quantitative metrics, scalability analysis, generalisation tests, and a clinical assessment by 11 clinicians. Our results show MInDI-3D's effectiveness, achieving a 12.96 (6.10) dB PSNR gain over uncorrected scans with only 50 projections on the CT-RATE pseudo-CBCT (independent real-world) test set and enabling an 8x reduction in imaging radiation exposure. We demonstrate its scalability by showing that performance improves with more training data. Importantly, MInDI-3D matches the performance of a 3D U-Net on real-world scans from 16 cancer patients across distortion and task-based metrics. It also generalises to new CBCT scanner geometries. Clinicians rated our model as sufficient for patient positioning across all anatomical sites and found it preserved lung tumour boundaries well.

Paper Structure

This paper contains 14 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: U-Net architecture with 4 hierarchical levels, showing layer-specific dimensionality ($C \times H \times W \times D$), where C is the number of channels, H is height, W is width, and D is depth (all in voxels), and time-embedding (T). SiLU (Sigmoid Linear Unit) activations introduce non-linearity.
  • Figure 2: CBCT images (axial and coronal views) of a breast cancer patient (HyperSight dataset), from left to right showing the sparse volume (50 projections), corrected volume using the MInDI-3D model, ground truth volume and a difference plot (ground truth - corrected volume).
  • Figure 3: Perception-distortion trade-off in progressive sampling of MInDI-3D on the test set HyperSight with 50 projections. The lineplot compares fidelity (PSNR) and perceptual quality (FD DINOv2) across sampling steps (1-10). Sampling with 2-5 steps improves distortion (higher PSNR) compared to 1 step, while further steps enhance realism (lower FD DINOv2) at the expense of fidelity. Adjusting sampling steps enables precise control over realism and fidelity: steps beyond 2 prioritise perceptual quality, but optimal step counts may vary by anatomy.
  • Figure 4: Comparing the MInDI-3D prediction of a lung tumour (lower right lung lobe) from a sparse 50 reconstruction with 1 vs. 30 steps (the ground truth and the difference of step 1 - step 30 as reference). There is an increase of sharpness and detail from step 1 to step 30