Table of Contents
Fetching ...

FOSCU: Feasibility of Synthetic MRI Generation via Duo-Diffusion Models for Enhancement of 3D U-Nets in Hepatic Segmentation

Youngung Han, Kyeonghun Kim, Seoyoung Ju, Yeonju Jean, Minkyung Cha, Seohyoung Park, Hyeonseok Jung, Nam-Joon Kim, Woo Kyoung Jeong, Ken Ying-Kai Liao, Hyuk-Jae Lee

Abstract

Medical image segmentation faces fundamental challenges including restricted access, costly annotation, and data shortage to clinical datasets through Picture Archiving and Communication Systems (PACS). These systemic barriers significantly impede the development of robust segmentation algorithms. To address these challenges, we propose FOSCU, which integrates Duo-Diffusion, a 3D latent diffusion model with ControlNet that simultaneously generates high-resolution, anatomically realistic synthetic MRI volumes and corresponding segmentation labels, and an enhanced 3D U-Net training pipeline. Duo-Diffusion employs segmentation-conditioned diffusion to ensure spatial consistency and precise anatomical detail in the generated data. Experimental evaluation on 720 abdominal MRI scans shows that models trained with combined real and synthetic data yield a mean Dice score gain of 0.67% over those using only real data, and achieve a 36.4% reduction in Fréchet Inception Distance (FID), reflecting enhanced image fidelity.

FOSCU: Feasibility of Synthetic MRI Generation via Duo-Diffusion Models for Enhancement of 3D U-Nets in Hepatic Segmentation

Abstract

Medical image segmentation faces fundamental challenges including restricted access, costly annotation, and data shortage to clinical datasets through Picture Archiving and Communication Systems (PACS). These systemic barriers significantly impede the development of robust segmentation algorithms. To address these challenges, we propose FOSCU, which integrates Duo-Diffusion, a 3D latent diffusion model with ControlNet that simultaneously generates high-resolution, anatomically realistic synthetic MRI volumes and corresponding segmentation labels, and an enhanced 3D U-Net training pipeline. Duo-Diffusion employs segmentation-conditioned diffusion to ensure spatial consistency and precise anatomical detail in the generated data. Experimental evaluation on 720 abdominal MRI scans shows that models trained with combined real and synthetic data yield a mean Dice score gain of 0.67% over those using only real data, and achieve a 36.4% reduction in Fréchet Inception Distance (FID), reflecting enhanced image fidelity.

Paper Structure

This paper contains 12 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Examples of generated high-resolution synthetic abdominal MRI volumes with corresponding segmentation conditions overlaid. (a) Axial, (b) Coronal, and (c) Sagittal views showing segmentation masks inferred by a 3D U-Net trained on data generated with Duo-Diffusion. (d) Axial slices extracted from the generated 3D synthetic MRI volume.
  • Figure 2: Overview of the proposed FOSCU framework. The workflow consists of three steps: (1) training the Duo-Diffusion architecture, which involves first training a 3D LDM to generate synthetic labels and then training a ControlNet to synthesize MRI volumes conditioned on paired labels, (2) performing inference with the pretrained Duo-Diffusion models to sequentially generate synthetic labels and corresponding synthetic volumes, and (3) leveraging the generated paired synthetic data to train a liver segmentation model capable of segmenting liver structures from synthetic MR images.
  • Figure 3: The segmentation model in the proposed FOSCU framework is based on a standard 3D U-Net architecture. The network takes 3D synthetic abdominal MRI volumes as input and outputs liver segmentation masks. The numbers in each block indicate the feature map dimensions $H\times W\times D\times C$, corresponding to height, width, depth, and channels.
  • Figure 4: Representative examples of synthetic abdominal MRI generation. The left columns show non-tumor cases and the right columns show tumor (HCC) cases. Each example includes the ground truth label, the original volume, and the synthetic volume generated by Duo-Diffusion.