Table of Contents
Fetching ...

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution

Zhe Wang, Yuhua Ru, Aladine Chetouani, Fang Chen, Fabian Bauer, Liping Zhang, Didier Hans, Rachid Jennane, Mohamed Jarraya, Yung Hsin Chen

TL;DR

This work addresses the limited spatial resolution of 3T MRI by introducing MoEDiff-SR, a Mixture-of-Experts guided diffusion model that performs region-adaptive denoising via a transformer-based gating network. It deploys three anatomically specialized diffusion experts to handle centrum semiovale, cortex, and grey-white matter junction, enabling 7T-like reconstructions from 3T inputs with enhanced cortical detail and tissue interfaces. The model is trained with gradient nonlinearity and bias field corrections, along with joint expert/gating losses and MAD-based supervision, achieving state-of-the-art perceptual and quantitative gains while supporting asynchronous MoE inference for efficient deployment. Clinical evaluation on MS cases demonstrates improved diagnostic visibility and alignment with ground-truth 7T delineation, underscoring the practical impact of region-aware diffusion SR in neuroimaging.

Abstract

Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diffusion-based SR models that apply a uniform denoising process across the entire image, MoEDiff-SR dynamically selects specialized denoising experts at a fine-grained token level, ensuring region-specific adaptation and enhanced SR performance. Specifically, our approach first employs a Transformer-based feature extractor to compute multi-scale patch embeddings, capturing both global structural information and local texture details. The extracted feature embeddings are then fed into an MoE gating network, which assigns adaptive weights to multiple diffusion-based denoisers, each specializing in different brain MRI characteristics, such as centrum semiovale, sulcal and gyral cortex, and grey-white matter junction. The final output is produced by aggregating the denoised results from these specialized experts according to dynamically assigned gating probabilities. Experimental results demonstrate that MoEDiff-SR outperforms existing state-of-the-art methods in terms of quantitative image quality metrics, perceptual fidelity, and computational efficiency. Difference maps from each expert further highlight their distinct specializations, confirming the effective region-specific denoising capability and the interpretability of expert contributions. Additionally, clinical evaluation validates its superior diagnostic capability in identifying subtle pathological features, emphasizing its practical relevance in clinical neuroimaging. Our code is available at https://github.com/ZWang78/MoEDiff-SR.

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution

TL;DR

This work addresses the limited spatial resolution of 3T MRI by introducing MoEDiff-SR, a Mixture-of-Experts guided diffusion model that performs region-adaptive denoising via a transformer-based gating network. It deploys three anatomically specialized diffusion experts to handle centrum semiovale, cortex, and grey-white matter junction, enabling 7T-like reconstructions from 3T inputs with enhanced cortical detail and tissue interfaces. The model is trained with gradient nonlinearity and bias field corrections, along with joint expert/gating losses and MAD-based supervision, achieving state-of-the-art perceptual and quantitative gains while supporting asynchronous MoE inference for efficient deployment. Clinical evaluation on MS cases demonstrates improved diagnostic visibility and alignment with ground-truth 7T delineation, underscoring the practical impact of region-aware diffusion SR in neuroimaging.

Abstract

Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diffusion-based SR models that apply a uniform denoising process across the entire image, MoEDiff-SR dynamically selects specialized denoising experts at a fine-grained token level, ensuring region-specific adaptation and enhanced SR performance. Specifically, our approach first employs a Transformer-based feature extractor to compute multi-scale patch embeddings, capturing both global structural information and local texture details. The extracted feature embeddings are then fed into an MoE gating network, which assigns adaptive weights to multiple diffusion-based denoisers, each specializing in different brain MRI characteristics, such as centrum semiovale, sulcal and gyral cortex, and grey-white matter junction. The final output is produced by aggregating the denoised results from these specialized experts according to dynamically assigned gating probabilities. Experimental results demonstrate that MoEDiff-SR outperforms existing state-of-the-art methods in terms of quantitative image quality metrics, perceptual fidelity, and computational efficiency. Difference maps from each expert further highlight their distinct specializations, confirming the effective region-specific denoising capability and the interpretability of expert contributions. Additionally, clinical evaluation validates its superior diagnostic capability in identifying subtle pathological features, emphasizing its practical relevance in clinical neuroimaging. Our code is available at https://github.com/ZWang78/MoEDiff-SR.

Paper Structure

This paper contains 26 sections, 18 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The flowchart of the proposed global methodology.
  • Figure 2: The proposed MoEDiff-SR workflow starts by extracting multi-scale patches from the conditional input 3T MRI slice $y$. These patches are subsequently transformed via a linear projection layer, and the obtained embeddings are further encoded by a Swin Transformer $\mathcal{E}_2$, resulting in conditional latent representation $z_c$. Next, the gating network $\mathcal{G}$ dynamically computes adaptive weights ($\mathcal{G}_1$, $\mathcal{G}_2$, and $\mathcal{G}_3$) for three specialized diffusion-based SR experts ($E_1$, $E_2$, and $E_3$). Each expert follows the fundamental architecture of the Conditional Latent Diffusion Model (CLDM) latent_diffusion_model and receives the latent representation $z_0$ from the 7T MRI, $z_c$ from the 3T MRI, the timestep $t$, and its assigned weight $\mathcal{G}_i$. The experts then individually generate region-specific denoised latent outputs $\hat{z}_{0(i)}$. Finally, these outputs are aggregated based on the dynamically assigned weights to produce the final weighted latent denoised code $\hat{z}_0$.
  • Figure 3: Convergence analysis of gating loss under different configurations.