Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis
Junkai Liu, Nay Aung, Theodoros N. Arvanitis, Joao A. C. Lima, Steffen E. Petersen, Daniel C. Alexander, Le Zhang
TL;DR
This work tackles the problem of missing data in multi-modal brain and cardiac MRI by eliminating reliance on externally supplied masks. It introduces CoPeDiT, a two-stage framework comprising a completeness-perception tokenizer (CoPeVAE) that learns discriminative prompts via self-supervised pretext tasks, and a 3D diffusion transformer (MDiT3D) that uses these prompts to guide unified 3D MRI synthesis. The approach yields superior robustness and semantic coherence across varying missing patterns, demonstrated on BraTS, IXI, UKBB, and other cardiac MRI datasets, with ablations confirming the benefits of the pretext tasks and learned prompts over traditional mask codes. The work highlights the potential of self-perception in diffusion-based medical image synthesis, offering mask-free, flexible, and clinically viable missing-data reconstruction.
Abstract
Missing data problems, such as missing modalities in multi-modal brain MRI and missing slices in cardiac MRI, pose significant challenges in clinical practice. Existing methods rely on external guidance to supply detailed missing state for instructing generative models to synthesize missing MRIs. However, manual indicators are not always available or reliable in real-world scenarios due to the unpredictable nature of clinical environments. Moreover, these explicit masks are not informative enough to provide guidance for improving semantic consistency. In this work, we argue that generative models should infer and recognize missing states in a self-perceptive manner, enabling them to better capture subtle anatomical and pathological variations. Towards this goal, we propose CoPeDiT, a general-purpose latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs. Specifically, we incorporate dedicated pretext tasks into our tokenizer, CoPeVAE, empowering it to learn completeness-aware discriminative prompts, and design MDiT3D, a specialized diffusion transformer architecture for 3D MRI synthesis, that effectively uses the learned prompts as guidance to enhance semantic consistency in 3D space. Comprehensive evaluations on three large-scale MRI datasets demonstrate that CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness, generalizability, and flexibility. The code is available at https://github.com/JK-Liu7/CoPeDiT .
