Table of Contents
Fetching ...

DEMIST: Decoupled Multi-stream latent diffusion for Quantitative Myelin Map Synthesis

Jiacheng Wang, Hao Li, Xing Yao, Ahmad Toubasi, Taegan Vinarsky, Caroline Gheen, Joy Derwenskus, Chaoyang Jin, Richard Dortch, Junzhong Xu, Francesca Bagnato, Ipek Oguz

TL;DR

The paper addresses the challenge of obtaining quantitative magnetization transfer-derived PSR maps by synthesizing PSR from standard MRI sequences. It introduces DEMIST, a two-stage 3D latent diffusion framework that translates T1w+FLAIR to PSR using a frozen BraTS-pretrained diffusion backbone with decoupled conditioning streams: semantic cross-attention, 3D ControlNet residuals, and adaptive LoRA on attention. AStage 1 learns aligned latent representations for PSR and conditioning images via independent 3D KL autoencoders, and Stage 2 integrates conditioning while preserving pretrained priors through a data-efficient, multi-stream conditioning scheme and edge-aware losses. Evaluated on 163 scans from 99 subjects with 5-fold CV, DEMIST outperforms GAN and diffusion baselines in PSNR, SSIM, MSE, and lesion-detection-related metrics, demonstrating sharper boundaries and better quantitative fidelity. The work enables PSR synthesis without lengthy qMT protocols, with potential clinical impact for MS assessment, though inference speed and cross-site generalization remain as future directions.

Abstract

Quantitative magnetization transfer (qMT) imaging provides myelin-sensitive biomarkers, such as the pool size ratio (PSR), which is valuable for multiple sclerosis (MS) assessment. However, qMT requires specialized 20-30 minute scans. We propose DEMIST to synthesize PSR maps from standard T1w and FLAIR images using a 3D latent diffusion model with three complementary conditioning mechanisms. Our approach has two stages: first, we train separate autoencoders for PSR and anatomical images to learn aligned latent representations. Second, we train a conditional diffusion model in this latent space on top of a frozen diffusion foundation backbone. Conditioning is decoupled into: (i) \textbf{semantic} tokens via cross-attention, (ii) \textbf{spatial} per-scale residual hints via a 3D ControlNet branch, and (iii) \textbf{adaptive} LoRA-modulated attention. We include edge-aware loss terms to preserve lesion boundaries and alignment losses to maintain quantitative consistency, while keeping the number of trainable parameters low and retaining the inductive bias of the pretrained model. We evaluate on 163 scans from 99 subjects using 5-fold cross-validation. Our method outperforms VAE, GAN and diffusion baselines on multiple metrics, producing sharper boundaries and better quantitative agreement with ground truth. Our code is publicly available at https://github.com/MedICL-VU/MS-Synthesis-3DcLDM.

DEMIST: Decoupled Multi-stream latent diffusion for Quantitative Myelin Map Synthesis

TL;DR

The paper addresses the challenge of obtaining quantitative magnetization transfer-derived PSR maps by synthesizing PSR from standard MRI sequences. It introduces DEMIST, a two-stage 3D latent diffusion framework that translates T1w+FLAIR to PSR using a frozen BraTS-pretrained diffusion backbone with decoupled conditioning streams: semantic cross-attention, 3D ControlNet residuals, and adaptive LoRA on attention. AStage 1 learns aligned latent representations for PSR and conditioning images via independent 3D KL autoencoders, and Stage 2 integrates conditioning while preserving pretrained priors through a data-efficient, multi-stream conditioning scheme and edge-aware losses. Evaluated on 163 scans from 99 subjects with 5-fold CV, DEMIST outperforms GAN and diffusion baselines in PSNR, SSIM, MSE, and lesion-detection-related metrics, demonstrating sharper boundaries and better quantitative fidelity. The work enables PSR synthesis without lengthy qMT protocols, with potential clinical impact for MS assessment, though inference speed and cross-site generalization remain as future directions.

Abstract

Quantitative magnetization transfer (qMT) imaging provides myelin-sensitive biomarkers, such as the pool size ratio (PSR), which is valuable for multiple sclerosis (MS) assessment. However, qMT requires specialized 20-30 minute scans. We propose DEMIST to synthesize PSR maps from standard T1w and FLAIR images using a 3D latent diffusion model with three complementary conditioning mechanisms. Our approach has two stages: first, we train separate autoencoders for PSR and anatomical images to learn aligned latent representations. Second, we train a conditional diffusion model in this latent space on top of a frozen diffusion foundation backbone. Conditioning is decoupled into: (i) \textbf{semantic} tokens via cross-attention, (ii) \textbf{spatial} per-scale residual hints via a 3D ControlNet branch, and (iii) \textbf{adaptive} LoRA-modulated attention. We include edge-aware loss terms to preserve lesion boundaries and alignment losses to maintain quantitative consistency, while keeping the number of trainable parameters low and retaining the inductive bias of the pretrained model. We evaluate on 163 scans from 99 subjects using 5-fold cross-validation. Our method outperforms VAE, GAN and diffusion baselines on multiple metrics, producing sharper boundaries and better quantitative agreement with ground truth. Our code is publicly available at https://github.com/MedICL-VU/MS-Synthesis-3DcLDM.

Paper Structure

This paper contains 5 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: Overview of our framework. Stage 1: A latent diffusion model learns in PSR latent space. Stage 2: A pretrained diffusion backbone receives three complementary streams derived from semantic tokens via cross-attention, spatial residual hints from a 3D ControlNet, and adaptive LoRA on attention projections while keeping backbone weights fixed. Edge-aware objective: gradient-magnitude maps and from a 3D Sobel operator, sharpening generated brain tissue structures.
  • Figure 2: Cross-validation fold averages (5-fold CV over 163 scans). Best in bold. $^{*}$Statistically significant over the best baseline (cWDM).
  • Figure 3: ROC curves discriminating T2 lesions from NAWM using PSR values. Colors indicate different methods (blue: ground truth, orange: ours). Solid lines: combined NAWM; dashed lines: proximal and distal NAWM.
  • Figure 4: Qualitative results. Each row shows conditions (MPRAGE, FLAIR), ground truth PSR, and synthesized outputs from our method and the baseline methods. Zoom panels highlight our method produces sharper boundaries for the lesion (in top row, an MS patient), and provides more accurate anatomy (highlighted in the putamen shape in the bottom row, a healthy control).