DEMIST: Decoupled Multi-stream latent diffusion for Quantitative Myelin Map Synthesis

Jiacheng Wang; Hao Li; Xing Yao; Ahmad Toubasi; Taegan Vinarsky; Caroline Gheen; Joy Derwenskus; Chaoyang Jin; Richard Dortch; Junzhong Xu; Francesca Bagnato; Ipek Oguz

DEMIST: Decoupled Multi-stream latent diffusion for Quantitative Myelin Map Synthesis

Jiacheng Wang, Hao Li, Xing Yao, Ahmad Toubasi, Taegan Vinarsky, Caroline Gheen, Joy Derwenskus, Chaoyang Jin, Richard Dortch, Junzhong Xu, Francesca Bagnato, Ipek Oguz

TL;DR

The paper addresses the challenge of obtaining quantitative magnetization transfer-derived PSR maps by synthesizing PSR from standard MRI sequences. It introduces DEMIST, a two-stage 3D latent diffusion framework that translates T1w+FLAIR to PSR using a frozen BraTS-pretrained diffusion backbone with decoupled conditioning streams: semantic cross-attention, 3D ControlNet residuals, and adaptive LoRA on attention. AStage 1 learns aligned latent representations for PSR and conditioning images via independent 3D KL autoencoders, and Stage 2 integrates conditioning while preserving pretrained priors through a data-efficient, multi-stream conditioning scheme and edge-aware losses. Evaluated on 163 scans from 99 subjects with 5-fold CV, DEMIST outperforms GAN and diffusion baselines in PSNR, SSIM, MSE, and lesion-detection-related metrics, demonstrating sharper boundaries and better quantitative fidelity. The work enables PSR synthesis without lengthy qMT protocols, with potential clinical impact for MS assessment, though inference speed and cross-site generalization remain as future directions.

Abstract

Quantitative magnetization transfer (qMT) imaging provides myelin-sensitive biomarkers, such as the pool size ratio (PSR), which is valuable for multiple sclerosis (MS) assessment. However, qMT requires specialized 20-30 minute scans. We propose DEMIST to synthesize PSR maps from standard T1w and FLAIR images using a 3D latent diffusion model with three complementary conditioning mechanisms. Our approach has two stages: first, we train separate autoencoders for PSR and anatomical images to learn aligned latent representations. Second, we train a conditional diffusion model in this latent space on top of a frozen diffusion foundation backbone. Conditioning is decoupled into: (i) \textbf{semantic} tokens via cross-attention, (ii) \textbf{spatial} per-scale residual hints via a 3D ControlNet branch, and (iii) \textbf{adaptive} LoRA-modulated attention. We include edge-aware loss terms to preserve lesion boundaries and alignment losses to maintain quantitative consistency, while keeping the number of trainable parameters low and retaining the inductive bias of the pretrained model. We evaluate on 163 scans from 99 subjects using 5-fold cross-validation. Our method outperforms VAE, GAN and diffusion baselines on multiple metrics, producing sharper boundaries and better quantitative agreement with ground truth. Our code is publicly available at https://github.com/MedICL-VU/MS-Synthesis-3DcLDM.

DEMIST: Decoupled Multi-stream latent diffusion for Quantitative Myelin Map Synthesis

TL;DR

Abstract

DEMIST: Decoupled Multi-stream latent diffusion for Quantitative Myelin Map Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)