Table of Contents
Fetching ...

StructDiff: Structure-aware Diffusion Model for 3D Fine-grained Medical Image Synthesis

Jiahao Xia, Yutao Hu, Yaolei Qi, Zhenliang Li, Wenqi Shao, Junjun He, Ying Fu, Longjiang Zhang, Guanyu Yang

TL;DR

This work tackles the scarcity of annotated 3D cardiac data by introducing StructDiff, a diffusion-based framework that generates fine-grained, topology-preserving medical volumes. It leverages a paired image–mask template to enforce explicit mask-to-image correspondence, a training-free Mask Generation Module to enrich structural priors, and a Confidence-aware Adaptive Learning strategy using Skip-Sampling Variance to weight synthetic data during downstream pre-training. The approach achieves state-of-the-art synthesis quality with strong topological fidelity and yields substantial gains in segmentation performance under data-scarce conditions. These innovations collectively enable scalable, high-fidelity generation of complex cardiac anatomies and improve downstream learning in medical imaging tasks.

Abstract

Solving medical imaging data scarcity through semantic image generation has attracted growing attention in recent years. However, existing generative models mainly focus on synthesizing whole-organ or large-tissue structures, showing limited capability in reproducing fine-grained anatomical details. Due to the stringent requirement of topological consistency and the complex 3D morphological heterogeneity of medical data, accurately reconstructing fine-grained anatomical details remains a significant challenge. To address these limitations, we propose StructDiff, a Structure-aware Diffusion Model for fine-grained 3D medical image synthesis, which enables precise generation of topologically complex anatomies. In addition to the conventional mask-based guidance, StructDiff further introduces a paired image-mask template to guide the generation process, providing structural constrains and offering explicit knowledge of mask-to-image correspondence. Moreover, a Mask Generation Module (MGM) is designed to enrich mask diversity and alleviate the scarcity of high-quality reference masks. Furthermore, we propose a Confidence-aware Adaptive Learning (CAL) strategy based on Skip-Sampling Variance (SSV), which mitigates uncertainty introduced by imperfect synthetic data when transferring to downstream tasks. Extensive experiments demonstrate that StructDiff achieves state-of-the-art performance in terms of topological consistency and visual realism, and significantly boosts downstream segmentation performance. Code will be released upon acceptance.

StructDiff: Structure-aware Diffusion Model for 3D Fine-grained Medical Image Synthesis

TL;DR

This work tackles the scarcity of annotated 3D cardiac data by introducing StructDiff, a diffusion-based framework that generates fine-grained, topology-preserving medical volumes. It leverages a paired image–mask template to enforce explicit mask-to-image correspondence, a training-free Mask Generation Module to enrich structural priors, and a Confidence-aware Adaptive Learning strategy using Skip-Sampling Variance to weight synthetic data during downstream pre-training. The approach achieves state-of-the-art synthesis quality with strong topological fidelity and yields substantial gains in segmentation performance under data-scarce conditions. These innovations collectively enable scalable, high-fidelity generation of complex cardiac anatomies and improve downstream learning in medical imaging tasks.

Abstract

Solving medical imaging data scarcity through semantic image generation has attracted growing attention in recent years. However, existing generative models mainly focus on synthesizing whole-organ or large-tissue structures, showing limited capability in reproducing fine-grained anatomical details. Due to the stringent requirement of topological consistency and the complex 3D morphological heterogeneity of medical data, accurately reconstructing fine-grained anatomical details remains a significant challenge. To address these limitations, we propose StructDiff, a Structure-aware Diffusion Model for fine-grained 3D medical image synthesis, which enables precise generation of topologically complex anatomies. In addition to the conventional mask-based guidance, StructDiff further introduces a paired image-mask template to guide the generation process, providing structural constrains and offering explicit knowledge of mask-to-image correspondence. Moreover, a Mask Generation Module (MGM) is designed to enrich mask diversity and alleviate the scarcity of high-quality reference masks. Furthermore, we propose a Confidence-aware Adaptive Learning (CAL) strategy based on Skip-Sampling Variance (SSV), which mitigates uncertainty introduced by imperfect synthetic data when transferring to downstream tasks. Extensive experiments demonstrate that StructDiff achieves state-of-the-art performance in terms of topological consistency and visual realism, and significantly boosts downstream segmentation performance. Code will be released upon acceptance.

Paper Structure

This paper contains 35 sections, 14 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Synthesizing fine-grained 3D anatomical structures remains challenging due to three key factors. Consequently, existing methods often produce blurry and anatomically inconsistent results, whereas StructDiff leverages paired image–mask templates to generate more accurate and topologically coherent volumes.
  • Figure 2: Overview of the proposed framework. (a) The Structure-aware Diffusion Model (StructDiff) is designed to generate the precise, diverse, and topology-preserved medical images based on template-guided conditions. (b) The Confidence-aware Adaptive Learning (CAL) strategy facilitates downstream segmentation pre-training by reducing the effect of imperfect synthetic samples.
  • Figure 3: The workflow of Mask Generation Module.
  • Figure 4: Comparison of synthetic fine-grained cardiac images generated by existing methods and our StructDiff framework. (a) The results of the conditional generation methods based on the given mask.(b) The results of the unconditional generation methods.
  • Figure 5: The generation results when different templates are applied on the same mask reference.
  • ...and 5 more figures