StructDiff: Structure-aware Diffusion Model for 3D Fine-grained Medical Image Synthesis
Jiahao Xia, Yutao Hu, Yaolei Qi, Zhenliang Li, Wenqi Shao, Junjun He, Ying Fu, Longjiang Zhang, Guanyu Yang
TL;DR
This work tackles the scarcity of annotated 3D cardiac data by introducing StructDiff, a diffusion-based framework that generates fine-grained, topology-preserving medical volumes. It leverages a paired image–mask template to enforce explicit mask-to-image correspondence, a training-free Mask Generation Module to enrich structural priors, and a Confidence-aware Adaptive Learning strategy using Skip-Sampling Variance to weight synthetic data during downstream pre-training. The approach achieves state-of-the-art synthesis quality with strong topological fidelity and yields substantial gains in segmentation performance under data-scarce conditions. These innovations collectively enable scalable, high-fidelity generation of complex cardiac anatomies and improve downstream learning in medical imaging tasks.
Abstract
Solving medical imaging data scarcity through semantic image generation has attracted growing attention in recent years. However, existing generative models mainly focus on synthesizing whole-organ or large-tissue structures, showing limited capability in reproducing fine-grained anatomical details. Due to the stringent requirement of topological consistency and the complex 3D morphological heterogeneity of medical data, accurately reconstructing fine-grained anatomical details remains a significant challenge. To address these limitations, we propose StructDiff, a Structure-aware Diffusion Model for fine-grained 3D medical image synthesis, which enables precise generation of topologically complex anatomies. In addition to the conventional mask-based guidance, StructDiff further introduces a paired image-mask template to guide the generation process, providing structural constrains and offering explicit knowledge of mask-to-image correspondence. Moreover, a Mask Generation Module (MGM) is designed to enrich mask diversity and alleviate the scarcity of high-quality reference masks. Furthermore, we propose a Confidence-aware Adaptive Learning (CAL) strategy based on Skip-Sampling Variance (SSV), which mitigates uncertainty introduced by imperfect synthetic data when transferring to downstream tasks. Extensive experiments demonstrate that StructDiff achieves state-of-the-art performance in terms of topological consistency and visual realism, and significantly boosts downstream segmentation performance. Code will be released upon acceptance.
