A Unified Framework for Multimodal Image Reconstruction and Synthesis using Denoising Diffusion Models
Weijie Gan, Xucheng Wang, Tongyao Wang, Wenshang Wang, Chunwei Ying, Yuyang Hu, Yasheng Chen, Hongyu An, Ulugbek S. Kamilov
TL;DR
Any2all presents a unified diffusion-based framework that treats multimodal image reconstruction and synthesis as a virtual inpainting problem. By training a single unconditional DDPM on a complete multimodal data stack and applying task-adaptive samplers (MPS and MDS) at inference, it can map any available input configuration to all desired modalities. The approach achieves competitive distortion-based performance while delivering superior perceptual quality across reconstruction and synthesis tasks, validated on a PET/MR/CT brain dataset. This framework has the potential to simplify clinical workflows by replacing many task-specific models with one flexible model, albeit with trade-offs in inference speed that motivate future acceleration work. The work demonstrates the versatility of a unified generative prior for diverse multimodal imaging tasks and highlights the balance between perceptual realism and quantitative fidelity in practical deployments.
Abstract
Image reconstruction and image synthesis are important for handling incomplete multimodal imaging data, but existing methods require various task-specific models, complicating training and deployment workflows. We introduce Any2all, a unified framework that addresses this limitation by formulating these disparate tasks as a single virtual inpainting problem. We train a single, unconditional diffusion model on the complete multimodal data stack. This model is then adapted at inference time to ``inpaint'' all target modalities from any combination of inputs of available clean images or noisy measurements. We validated Any2all on a PET/MR/CT brain dataset. Our results show that Any2all can achieve excellent performance on both multimodal reconstruction and synthesis tasks, consistently yielding images with competitive distortion-based performance and superior perceptual quality over specialized methods.
