MedCondDiff: Lightweight, Robust, Semantically Guided Diffusion for Medical Image Segmentation
Ruirui Huang, Jiacheng Li
TL;DR
The paper tackles multi-organ medical image segmentation across modalities by introducing MedCondDiff, a diffusion-based approach conditioned on semantic priors. It integrates a lightweight adapter that injects hierarchical priors from a Pyramid Vision Transformer into the denoising network, producing anatomically faithful masks with reduced memory and faster inference. Key contributions include a unified adapter framework for diffusion conditioning, a PVT-based conditioning backbone, and empirical validation across abdominal CT and brain MRI datasets showing efficiency with competitive accuracy. The work demonstrates the practicality of semantically guided diffusion for medical imaging, particularly in resource-constrained settings and diverse modalities.
Abstract
We introduce MedCondDiff, a diffusion-based framework for multi-organ medical image segmentation that is efficient and anatomically grounded. The model conditions the denoising process on semantic priors extracted by a Pyramid Vision Transformer (PVT) backbone, yielding a semantically guided and lightweight diffusion architecture. This design improves robustness while reducing both inference time and VRAM usage compared to conventional diffusion models. Experiments on multi-organ, multi-modality datasets demonstrate that MedCondDiff delivers competitive performance across anatomical regions and imaging modalities, underscoring the potential of semantically guided diffusion models as an effective class of architectures for medical imaging tasks.
