Table of Contents
Fetching ...

3D-LLDM: Label-Guided 3D Latent Diffusion Model for Improving High-Resolution Synthetic MR Imaging in Hepatic Structure Segmentation

Kyeonghun Kim, Jaehyeok Bae, Youngung Han, Joo Young Bae, Seoyoung Ju, Junsu Lim, Gyeongmin Kim, Nam-Joon Kim, Woo Kyoung Jeong, Ken Ying-Kai Liao, Won Jae Lee, Pa Hong, Hyuk-Jae Lee

Abstract

Deep learning and generative models are advancing rapidly, with synthetic data increasingly being integrated into training pipelines for downstream analysis tasks. However, in medical imaging, their adoption remains constrained by the scarcity of reliable annotated datasets. To address this limitation, we propose 3D-LLDM, a label-guided 3D latent diffusion model that generates high-quality synthetic magnetic resonance (MR) volumes with corresponding anatomical segmentation masks. Our approach uses hepatobiliary phase MR images enhanced with the Gd-EOB-DTPA contrast agent to derive structural masks for the liver, portal vein, hepatic vein, and hepatocellular carcinoma, which then guide volumetric synthesis through a ControlNet-based architecture. Trained on 720 real clinical hepatobiliary phase MR scans from Samsung Medical Center, 3D-LLDM achieves a Fréchet Inception Distance (FID) of 28.31, improving over GANs by 70.9% and over state-of-the-art diffusion baselines by 26.7%. When used for data augmentation, the synthetic volumes improve hepatocellular carcinoma segmentation by up to 11.153% Dice score across five CNN architectures.

3D-LLDM: Label-Guided 3D Latent Diffusion Model for Improving High-Resolution Synthetic MR Imaging in Hepatic Structure Segmentation

Abstract

Deep learning and generative models are advancing rapidly, with synthetic data increasingly being integrated into training pipelines for downstream analysis tasks. However, in medical imaging, their adoption remains constrained by the scarcity of reliable annotated datasets. To address this limitation, we propose 3D-LLDM, a label-guided 3D latent diffusion model that generates high-quality synthetic magnetic resonance (MR) volumes with corresponding anatomical segmentation masks. Our approach uses hepatobiliary phase MR images enhanced with the Gd-EOB-DTPA contrast agent to derive structural masks for the liver, portal vein, hepatic vein, and hepatocellular carcinoma, which then guide volumetric synthesis through a ControlNet-based architecture. Trained on 720 real clinical hepatobiliary phase MR scans from Samsung Medical Center, 3D-LLDM achieves a Fréchet Inception Distance (FID) of 28.31, improving over GANs by 70.9% and over state-of-the-art diffusion baselines by 26.7%. When used for data augmentation, the synthetic volumes improve hepatocellular carcinoma segmentation by up to 11.153% Dice score across five CNN architectures.
Paper Structure (13 sections, 8 equations, 4 figures, 2 tables)

This paper contains 13 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: High-Quality synthetic MR volume generated by 3D-LLDM demonstrating anotomical consistency across multi planes: (a) Coronal image, (b) sagittal image, and (c) axial image reformatted from the high-resolution 3D volumetric data shown in (d). Note the tumor highlighted by a red circle, the right hepatic vein indicated by an arrowhead, and the left portal vein marked by an arrow.
  • Figure 2: Overview of the training and inference pipeline of the proposed 3D-LLDM. The process consists of four steps: (1) training the autoencoder for both label and volume reconstruction, (2) training the latent diffusion model for label and volume synthesis, (3) training the ControlNet to condition the diffusion process on labels, and (4) performing inference using the pretrained latent diffusion models and ControlNet.
  • Figure 3: Qualitative comparison of synthetic MR volumes of various levels generated by state-of-the-art generative models. The columns correspond to: (1) the label-guided input, (2) the ground truth (GT) MRI scan, (3) the HA-GAN-generated synthetic image, (4) the 3D-DDPM-generated synthetic image, (5) the 3D Latent Diffusion Model (3D LDM) with an autoencoder (AE) backbone, and (6) the proposed 3D-LLDM (Ours)-generated synthetic image.
  • Figure 4: Qualitative comparison of multi-class segmentation results using U-Net with different training datasets. Each row represents a test sample, while the columns correspond to (1) input MR image, (2) ground truth segmentation, (3) segmentation using only real training data, and (4) segmentation using real+synthetic data from 3D-LLDM. Notably, the 2nd and 4th columns show a more continuous middle hepaticvein, whereas the 3rd column shows a discontinuous appearance.