Diffusion Active Learning: Towards Data-Driven Experimental Design in Computed Tomography
Luis Barba, Johannes Kirschner, Tomas Aidukas, Manuel Guizar-Sicairos, Benjamín Béjar
TL;DR
This work introduces Diffusion Active Learning (DAL), a data-driven framework that integrates score-based diffusion priors with sequential experimental design to reduce CT data requirements while improving reconstructions. By pre-training an unconditional diffusion model on domain tomographic data and performing diffusion-based posterior sampling conditioned on current measurements, DAL enables data-aware uncertainty quantification and informs the selection of the next projection angle through posterior-variance maximization. The approach demonstrates substantial data-efficiency gains across three real-world CT datasets and, in many cases, outperforms dataset-agnostic baselines, achieving similar PSNR with up to ~4× fewer measurements and faster inference. Beyond CT, the methodology is applicable to other linear and non-linear inverse problems with differentiable forward processes (e.g., MRI), offering practical reductions in acquisition time and radiation dose where data collection is costly. Limitations include the need for domain data to train the diffusion prior and potential bias when samples lie outside the training distribution, motivating future work on out-of-distribution robustness and foundation-model pre-training on diverse tomographic data.
Abstract
We introduce Diffusion Active Learning, a novel approach that combines generative diffusion modeling with data-driven sequential experimental design to adaptively acquire data for inverse problems. Although broadly applicable, we focus on scientific computed tomography (CT) for experimental validation, where structured prior datasets are available, and reducing data requirements directly translates to shorter measurement times and lower X-ray doses. We first pre-train an unconditional diffusion model on domain-specific CT reconstructions. The diffusion model acts as a learned prior that is data-dependent and captures the structure of the underlying data distribution, which is then used in two ways: It drives the active learning process and also improves the quality of the reconstructions. During the active learning loop, we employ a variant of diffusion posterior sampling to generate conditional data samples from the posterior distribution, ensuring consistency with the current measurements. Using these samples, we quantify the uncertainty in the current estimate to select the most informative next measurement. Our results show substantial reductions in data acquisition requirements, corresponding to lower X-ray doses, while simultaneously improving image reconstruction quality across multiple real-world tomography datasets.
