Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process
Tianyu Lin, Zhiguang Chen, Zhonghao Yan, Weijiang Yu, Fudan Zheng
TL;DR
Diffusion-model-based medical image segmentation is powerful but computationally intensive, often requiring multi-step reverse processes and multiple samples for reliable predictions. This work introduces SDSeg, a latent-diffusion segmentation model built on Stable Diffusion that uses a simple latent estimation loss to enable a single-step reverse and a concatenate latent fusion strategy to avoid multiple samples, complemented by a trainable vision encoder for cross-domain adaptability. SDSeg achieves state-of-the-art performance on five datasets spanning RGB 2D and CT 3D modalities while dramatically reducing training requirements and enabling fast, stable inference. The approach offers a practical, scalable solution for automated medical image segmentation with reliable outputs.
Abstract
Diffusion models have demonstrated their effectiveness across various generative tasks. However, when applied to medical image segmentation, these models encounter several challenges, including significant resource and time requirements. They also necessitate a multi-step reverse process and multiple samples to produce reliable predictions. To address these challenges, we introduce the first latent diffusion segmentation model, named SDSeg, built upon stable diffusion (SD). SDSeg incorporates a straightforward latent estimation strategy to facilitate a single-step reverse process and utilizes latent fusion concatenation to remove the necessity for multiple samples. Extensive experiments indicate that SDSeg surpasses existing state-of-the-art methods on five benchmark datasets featuring diverse imaging modalities. Remarkably, SDSeg is capable of generating stable predictions with a solitary reverse step and sample, epitomizing the model's stability as implied by its name. The code is available at https://github.com/lin-tianyu/Stable-Diffusion-Seg
