Table of Contents
Fetching ...

Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

Junkai Liu, Nay Aung, Theodoros N. Arvanitis, Joao A. C. Lima, Steffen E. Petersen, Daniel C. Alexander, Le Zhang

TL;DR

This work tackles the problem of missing data in multi-modal brain and cardiac MRI by eliminating reliance on externally supplied masks. It introduces CoPeDiT, a two-stage framework comprising a completeness-perception tokenizer (CoPeVAE) that learns discriminative prompts via self-supervised pretext tasks, and a 3D diffusion transformer (MDiT3D) that uses these prompts to guide unified 3D MRI synthesis. The approach yields superior robustness and semantic coherence across varying missing patterns, demonstrated on BraTS, IXI, UKBB, and other cardiac MRI datasets, with ablations confirming the benefits of the pretext tasks and learned prompts over traditional mask codes. The work highlights the potential of self-perception in diffusion-based medical image synthesis, offering mask-free, flexible, and clinically viable missing-data reconstruction.

Abstract

Missing data problems, such as missing modalities in multi-modal brain MRI and missing slices in cardiac MRI, pose significant challenges in clinical practice. Existing methods rely on external guidance to supply detailed missing state for instructing generative models to synthesize missing MRIs. However, manual indicators are not always available or reliable in real-world scenarios due to the unpredictable nature of clinical environments. Moreover, these explicit masks are not informative enough to provide guidance for improving semantic consistency. In this work, we argue that generative models should infer and recognize missing states in a self-perceptive manner, enabling them to better capture subtle anatomical and pathological variations. Towards this goal, we propose CoPeDiT, a general-purpose latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs. Specifically, we incorporate dedicated pretext tasks into our tokenizer, CoPeVAE, empowering it to learn completeness-aware discriminative prompts, and design MDiT3D, a specialized diffusion transformer architecture for 3D MRI synthesis, that effectively uses the learned prompts as guidance to enhance semantic consistency in 3D space. Comprehensive evaluations on three large-scale MRI datasets demonstrate that CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness, generalizability, and flexibility. The code is available at https://github.com/JK-Liu7/CoPeDiT .

Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

TL;DR

This work tackles the problem of missing data in multi-modal brain and cardiac MRI by eliminating reliance on externally supplied masks. It introduces CoPeDiT, a two-stage framework comprising a completeness-perception tokenizer (CoPeVAE) that learns discriminative prompts via self-supervised pretext tasks, and a 3D diffusion transformer (MDiT3D) that uses these prompts to guide unified 3D MRI synthesis. The approach yields superior robustness and semantic coherence across varying missing patterns, demonstrated on BraTS, IXI, UKBB, and other cardiac MRI datasets, with ablations confirming the benefits of the pretext tasks and learned prompts over traditional mask codes. The work highlights the potential of self-perception in diffusion-based medical image synthesis, offering mask-free, flexible, and clinically viable missing-data reconstruction.

Abstract

Missing data problems, such as missing modalities in multi-modal brain MRI and missing slices in cardiac MRI, pose significant challenges in clinical practice. Existing methods rely on external guidance to supply detailed missing state for instructing generative models to synthesize missing MRIs. However, manual indicators are not always available or reliable in real-world scenarios due to the unpredictable nature of clinical environments. Moreover, these explicit masks are not informative enough to provide guidance for improving semantic consistency. In this work, we argue that generative models should infer and recognize missing states in a self-perceptive manner, enabling them to better capture subtle anatomical and pathological variations. Towards this goal, we propose CoPeDiT, a general-purpose latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs. Specifically, we incorporate dedicated pretext tasks into our tokenizer, CoPeVAE, empowering it to learn completeness-aware discriminative prompts, and design MDiT3D, a specialized diffusion transformer architecture for 3D MRI synthesis, that effectively uses the learned prompts as guidance to enhance semantic consistency in 3D space. Comprehensive evaluations on three large-scale MRI datasets demonstrate that CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness, generalizability, and flexibility. The code is available at https://github.com/JK-Liu7/CoPeDiT .
Paper Structure (22 sections, 5 equations, 10 figures, 14 tables)

This paper contains 22 sections, 5 equations, 10 figures, 14 tables.

Figures (10)

  • Figure 1: The motivation of CoPeDiT. (a) Comparison of completeness perception between CoPeDiT and prior methods. (b) Prompt tokens offer more effective guidance than binary mask codes. (c) CoPeVAE enables more discriminative latent representations. (d) Prompt tokens yield more semantically consistent attention maps in MDiT3D blocks, highlighting stronger interactions between similar modalities (e.g., T1-T1ce, T2-FLAIR).
  • Figure 2: The overview framework of CoPeVAE. We implement two variants, CoPeVAE-B and CoPeVAE-C, with slight architectural modifications for brain and cardiac MRI synthesis tasks, respectively.
  • Figure 3: The overview framework of MDiT3D. PE and RoPE denote positional embeddings and rotary position embeddings SU2024127063yang2024cogvideox, respectively.
  • Figure 4: Qualitative results on BraTS dataset. The results are depicted in the axial (top), sagittal (middle) and coronal (bottom) views of the 3D MRI volume. The visual results on IXI dataset are provided in the supplementary material.
  • Figure 5: Qualitative results of comparison on UKBB dataset. The top and bottom results correspond to the first and last missing slices within a given volume, respectively.
  • ...and 5 more figures