Table of Contents
Fetching ...

M2Diff: Multi-Modality Multi-Task Enhanced Diffusion Model for MRI-Guided Low-Dose PET Enhancement

Ghulam Nabi Ahmad Hassan Yar, Himashi Peiris, Victoria Mar, Cameron Dennis Pain, Zhaolin Chen

TL;DR

A multi-modality multi-task diffusion model (M2Diff) that processes MRI and LD PET scans separately to learn modality-specific features and fuse them via hierarchical feature fusion to reconstruct SD PET is introduced.

Abstract

Positron emission tomography (PET) scans expose patients to radiation, which can be mitigated by reducing the dose, albeit at the cost of diminished quality. This makes low-dose (LD) PET recovery an active research area. Previous studies have focused on standard-dose (SD) PET recovery from LD PET scans and/or multi-modal scans, e.g., PET/CT or PET/MRI, using deep learning. While these studies incorporate multi-modal information through conditioning in a single-task model, such approaches may limit the capacity to extract modality-specific features, potentially leading to early feature dilution. Although recent studies have begun incorporating pathology-rich data, challenges remain in effectively leveraging multi-modality inputs for reconstructing diverse features, particularly in heterogeneous patient populations. To address these limitations, we introduce a multi-modality multi-task diffusion model (M2Diff) that processes MRI and LD PET scans separately to learn modality-specific features and fuse them via hierarchical feature fusion to reconstruct SD PET. This design enables effective integration of complementary structural and functional information, leading to improved reconstruction fidelity. We have validated the effectiveness of our model on both healthy and Alzheimer's disease brain datasets. The M2Diff achieves superior qualitative and quantitative performance on both datasets.

M2Diff: Multi-Modality Multi-Task Enhanced Diffusion Model for MRI-Guided Low-Dose PET Enhancement

TL;DR

A multi-modality multi-task diffusion model (M2Diff) that processes MRI and LD PET scans separately to learn modality-specific features and fuse them via hierarchical feature fusion to reconstruct SD PET is introduced.

Abstract

Positron emission tomography (PET) scans expose patients to radiation, which can be mitigated by reducing the dose, albeit at the cost of diminished quality. This makes low-dose (LD) PET recovery an active research area. Previous studies have focused on standard-dose (SD) PET recovery from LD PET scans and/or multi-modal scans, e.g., PET/CT or PET/MRI, using deep learning. While these studies incorporate multi-modal information through conditioning in a single-task model, such approaches may limit the capacity to extract modality-specific features, potentially leading to early feature dilution. Although recent studies have begun incorporating pathology-rich data, challenges remain in effectively leveraging multi-modality inputs for reconstructing diverse features, particularly in heterogeneous patient populations. To address these limitations, we introduce a multi-modality multi-task diffusion model (M2Diff) that processes MRI and LD PET scans separately to learn modality-specific features and fuse them via hierarchical feature fusion to reconstruct SD PET. This design enables effective integration of complementary structural and functional information, leading to improved reconstruction fidelity. We have validated the effectiveness of our model on both healthy and Alzheimer's disease brain datasets. The M2Diff achieves superior qualitative and quantitative performance on both datasets.
Paper Structure (18 sections, 11 equations, 8 figures, 5 tables)

This paper contains 18 sections, 11 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Overview of the proposed M2Diff architecture for multi-task PET reconstruction. The model takes low-dose PET ($\mathbf{X}_i$) and anatomical MRI ($\mathbf{Z}_i$) as inputs and comprises two task-specific branches—Task 1 and Task 2—targeting complementary PET reconstruction goals. Each branch includes an encoder $\mathcal{F}_k$, and a decoder $\mathcal{D}_k$ ($k \in {1, 2}$). Features from all encoder levels are fused through a Hierarchical Feature Fusion module, enabling cross-task interaction. Final outputs $\hat{\mathbf{Y}}_0$ are optimised via hybrid losses.
  • Figure 2: Comparison of reconstructed PET images across baseline models and the proposed M2Diff on the DaCRA dataset with DRF of $\times$100. Each row corresponds to a different subject scan, and each column represents a different reconstruction method, including T1-weighted MRI, low-dose input (LD), standard-dose ground truth (SD), and competing methods: CycleWGAN, Pix2PixHD, PT-WGAN, Multi-branch UNet, CDM, IDDPM, DDPM-PETMR, DiffusionMTL, and our proposed M2Diff. The upper portion of each tile shows the full axial brain slice, while the lower zoomed-in views (red-bordered) highlight regions of interest. All PET images, including zoomed ROIs, are displayed using a fixed intensity range of [0,1] for all methods.
  • Figure 3: Comparison of reconstructed PET images across baseline models and the proposed M2Diff on the DaCRA dataset with DRF of $\times$20. Each row corresponds to a different subject scan, and each column represents a different reconstruction method, including T1-weighted MRI, low-dose input (LD), standard-dose ground truth (SD), and competing methods: CycleWGAN, Pix2PixHD, Multi-branch UNet, IDDPM, DDPM-PETMR, DiffusionMTL, and our proposed M2Diff. The upper portion of each tile shows the full axial brain slice, while the lower zoomed-in views (red-bordered) highlight regions of interest. All PET images, including zoomed ROIs, are displayed using a fixed intensity range of [0,1] for all methods.
  • Figure 4: Qualitative comparison of reconstructed PET images across baseline models and the proposed M2Diff on the ADNI dataset. Each row corresponds to a different patient scan, and each column represents a different reconstruction method, including T1-weighted MRI, low-dose input (LD), standard-dose ground truth (SD), and competing methods: CycleWGAN, Pix2PixHD, Multi-branch UNet, IDDPM, DDPM-PETMR, and our proposed M2Diff. The upper portion of each tile shows the full axial brain slice, while the lower zoomed-in views (red-bordered) highlight regions of interest. All PET images, including zoomed ROIs, are displayed using a fixed intensity range of [0,1] for all methods.
  • Figure 5: Qualitative visualization of the proposed M2Diff model’s performance across reconstructed sagittal and coronal views. Each column shows the corresponding T1-weighted MRI, low-dose (LD) PET input, standard-dose (SD) reference, and reconstructed PET output. The upper portion of each tile shows the full axial brain slice, while the lower zoomed-in views (red-bordered) highlight regions of interest. All PET images, including zoomed ROIs, are displayed using a fixed intensity range of [0,1] for all methods.
  • ...and 3 more figures