Local Patches Meet Global Context: Scalable 3D Diffusion Priors for Computed Tomography Reconstruction
Taewon Yang, Jason Hu, Jeffrey A. Fessler, Liyue Shen
TL;DR
The paper tackles the challenge of learning scalable 3D diffusion priors for high-resolution CT reconstruction under data and compute constraints. It introduces a global-aware 3D patch diffusion model that jointly learns local 3D patches and a downsampled global volume, enabling efficient generation and accurate reconstruction of 3D CT volumes. Through extensive experiments on LIDC-IDRI and AAPM datasets, the approach achieves state-of-the-art performance in sparse-view CT reconstruction and demonstrates faster inference compared to baselines, while providing detailed ablations on key design choices. The work highlights the value of integrating local patch statistics with global context to form a coherent 3D prior and discusses avenues for improving robustness to less structured data.
Abstract
Diffusion models learn strong image priors that can be leveraged to solve inverse problems like medical image reconstruction. However, for real-world applications such as 3D Computed Tomography (CT) imaging, directly training diffusion models on 3D data presents significant challenges due to the high computational demands of extensive GPU resources and large-scale datasets. Existing works mostly reuse 2D diffusion priors to address 3D inverse problems, but fail to fully realize and leverage the generative capacity of diffusion models for high-dimensional data. In this study, we propose a novel 3D patch-based diffusion model that can learn a fully 3D diffusion prior from limited data, enabling scalable generation of high-resolution 3D images. Our core idea is to learn the prior of 3D patches to achieve scalable efficiency, while coupling local and global information to guarantee high-quality 3D image generation, by modeling the joint distribution of position-aware 3D local patches and downsampled 3D volume as global context. Our approach not only enables high-quality 3D generation, but also offers an unprecedentedly efficient and accurate solution to high-resolution 3D inverse problems. Experiments on 3D CT reconstruction across multiple datasets show that our method outperforms state-of-the-art methods in both performance and efficiency, notably achieving high-resolution 3D reconstruction of $512 \times 512 \times 256$ ($\sim$20 mins).
