Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models
Hyungjin Chung, Dohoon Ryu, Michael T. McCann, Marc L. Klasky, Jong Chul Ye
TL;DR
This work tackles 3D inverse problems in medical imaging by integrating a pretrained 2D diffusion prior with a model-based regularizer, enabling coherent 3D reconstructions from severely undersampled measurements. The proposed DiffusionMBIR performs diffusion denoising slice-by-slice along the z-axis while enforcing cross-slice consistency through a 3D ADMM data-consistency step and a z-direction TV prior, achieving memory efficiency suitable for commodity GPUs. Across sparse-view CT, limited-angle CT, and compressed sensing MRI, it delivers state-of-the-art results and demonstrates robust generalization to out-of-distribution data, even with minimal 3D training data. The approach offers a practical, scalable route to high-fidelity 3D reconstructions by leveraging 2D diffusion priors and MBIR-driven optimization, with strong implications for clinical imaging workflows.
Abstract
Diffusion models have emerged as the new state-of-the-art generative model with high quality samples, with intriguing properties such as mode coverage and high flexibility. They have also been shown to be effective inverse problem solvers, acting as the prior of the distribution, while the information of the forward model can be granted at the sampling stage. Nonetheless, as the generative process remains in the same high dimensional (i.e. identical to data dimension) space, the models have not been extended to 3D inverse problems due to the extremely high memory and computational cost. In this paper, we combine the ideas from the conventional model-based iterative reconstruction with the modern diffusion models, which leads to a highly effective method for solving 3D medical image reconstruction tasks such as sparse-view tomography, limited angle tomography, compressed sensing MRI from pre-trained 2D diffusion models. In essence, we propose to augment the 2D diffusion prior with a model-based prior in the remaining direction at test time, such that one can achieve coherent reconstructions across all dimensions. Our method can be run in a single commodity GPU, and establishes the new state-of-the-art, showing that the proposed method can perform reconstructions of high fidelity and accuracy even in the most extreme cases (e.g. 2-view 3D tomography). We further reveal that the generalization capacity of the proposed method is surprisingly high, and can be used to reconstruct volumes that are entirely different from the training dataset.
