Table of Contents
Fetching ...

FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model

Molin Zhang, Polina Golland, Patricia Ellen Grant, Elfar Adalsteinsson

TL;DR

FetalDiffusion addresses motion artifacts in fetal MRI by synthesizing 3D fetal MRI with controllable poses using a pose-conditioned diffusion model. It conditions generation on a pose mask derived from 15 landmarks and limb regions via Pose Condition Blocks, and introduces an auxiliary pose-level loss to enforce pose-consistent synthesis. The approach yields high-fidelity synthetic data and enhances fetal pose estimation when real training data are scarce, reporting a notable 15.4% PCK improvement and 50.2% reduction in mean error, validated on a single 32 GB GPU. This work demonstrates data-efficient, motion-aware synthesis with practical implications for real-time fetal motion tracking.

Abstract

The quality of fetal MRI is significantly affected by unpredictable and substantial fetal motion, leading to the introduction of artifacts even when fast acquisition sequences are employed. The development of 3D real-time fetal pose estimation approaches on volumetric EPI fetal MRI opens up a promising avenue for fetal motion monitoring and prediction. Challenges arise in fetal pose estimation due to limited number of real scanned fetal MR training images, hindering model generalization when the acquired fetal MRI lacks adequate pose. In this study, we introduce FetalDiffusion, a novel approach utilizing a conditional diffusion model to generate 3D synthetic fetal MRI with controllable pose. Additionally, an auxiliary pose-level loss is adopted to enhance model performance. Our work demonstrates the success of this proposed model by producing high-quality synthetic fetal MRI images with accurate and recognizable fetal poses, comparing favorably with in-vivo real fetal MRI. Furthermore, we show that the integration of synthetic fetal MR images enhances the fetal pose estimation model's performance, particularly when the number of available real scanned data is limited resulting in 15.4% increase in PCK and 50.2% reduced in mean error. All experiments are done on a single 32GB V100 GPU. Our method holds promise for improving real-time tracking models, thereby addressing fetal motion issues more effectively.

FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model

TL;DR

FetalDiffusion addresses motion artifacts in fetal MRI by synthesizing 3D fetal MRI with controllable poses using a pose-conditioned diffusion model. It conditions generation on a pose mask derived from 15 landmarks and limb regions via Pose Condition Blocks, and introduces an auxiliary pose-level loss to enforce pose-consistent synthesis. The approach yields high-fidelity synthetic data and enhances fetal pose estimation when real training data are scarce, reporting a notable 15.4% PCK improvement and 50.2% reduction in mean error, validated on a single 32 GB GPU. This work demonstrates data-efficient, motion-aware synthesis with practical implications for real-time fetal motion tracking.

Abstract

The quality of fetal MRI is significantly affected by unpredictable and substantial fetal motion, leading to the introduction of artifacts even when fast acquisition sequences are employed. The development of 3D real-time fetal pose estimation approaches on volumetric EPI fetal MRI opens up a promising avenue for fetal motion monitoring and prediction. Challenges arise in fetal pose estimation due to limited number of real scanned fetal MR training images, hindering model generalization when the acquired fetal MRI lacks adequate pose. In this study, we introduce FetalDiffusion, a novel approach utilizing a conditional diffusion model to generate 3D synthetic fetal MRI with controllable pose. Additionally, an auxiliary pose-level loss is adopted to enhance model performance. Our work demonstrates the success of this proposed model by producing high-quality synthetic fetal MRI images with accurate and recognizable fetal poses, comparing favorably with in-vivo real fetal MRI. Furthermore, we show that the integration of synthetic fetal MR images enhances the fetal pose estimation model's performance, particularly when the number of available real scanned data is limited resulting in 15.4% increase in PCK and 50.2% reduced in mean error. All experiments are done on a single 32GB V100 GPU. Our method holds promise for improving real-time tracking models, thereby addressing fetal motion issues more effectively.
Paper Structure (12 sections, 5 equations, 3 figures, 2 tables)

This paper contains 12 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overall framework. We initially train the pose estimation network using a limited-size dataset. For pose conditioning, we incorporate both landmark spots and limb masks within the condition mask. We follow the same diffusion process as DDPM ho2020denoising. The condition mask and noisy image are concatenated as input for the 3D denoising Unet, featuring four downsampling and upsampling layers (64, 128, 128, 256 channels). Pose Condition Blocks (PCB) are embedded in the last two layers, with the mask downsampled accordingly. For the attention module in PCB, we use 8 heads and the same channel number (128, 256) for each downsampling level at the highest two levels. Using the predicted noise, we directly project back to the image and input it into the trained pose estimation network to create an auxiliary pose loss, enhancing overall performance.
  • Figure 2: Illustration of synthetic data given a pose from the training dataset is shown below. Rows represent images from real scan data, our proposed method, limb mask without pose loss, and the baseline method using a landmark spot mask. Columns are slices at the z-direction. The last column displays pose estimation results on slice 51 (where the target landmark, shoulder, is located) from the trained network. Our proposed method generates high-fidelity data with correct limb and landmark positioning in the condition mask (light yellow mask) and is detectable by the trained pose estimation network. While the limb mask condition can generate limbs under the condition mask, the right 'arm' is not attached to the fetal body, making it undetectable by the pose estimation network, as indicated by the yellow box. The baseline fails to follow the condition information and does not generate convincing images.
  • Figure 3: Illustration of synthetic data from our proposed method with an unseen pose is presented in the first column. Rows 1 and 2 depict two reference poses from the training dataset and their corresponding real scanned data. The third row displays an artificially created pose by center interpolating pose ref 1 and ref 2. The fourth row showcases a manually created pose from ref 2, simulating a kicking action in legs and elbows. The color mask represents the condition mask for the diffusion model, with red for left limbs, blue for right limbs, and yellow for landmarks. Our proposed method generates high-fidelity and controllable images for these unseen poses.