S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging
Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang
TL;DR
Robust semantic segmentation in medical imaging is hampered by limited and heterogeneous data. The authors introduce Semantic Stacking for Semantic Segmentation (S2S2), a domain-agnostic add-on that uses a semantic stack of synthetically generated images conditioned on segmentation maps to denoise the model's semantic representations, enforced via a semantic-consistency loss $\mathcal{L}_{sc}$. They derive a practical two-sample objective based on Bayesian updating, enabling efficient training with two samples per iteration, and implement the stack using a fine-tuned Stable Diffusion 2.x model with ControlNet. Across RGB, CT, and MRI datasets and multiple architectures, S2S2 improves both in-domain and out-of-domain performance, and ablations highlight encoder-level consistency as the main contributor while remaining compatible with domain-specific augmentations as a complement. This data-driven, modality-agnostic approach offers a versatile path to robust medical image segmentation without relying on domain-specific priors.
Abstract
Robustness and generalizability in medical image segmentation are often hindered by scarcity and limited diversity of training data, which stands in contrast to the variability encountered during inference. While conventional strategies -- such as domain-specific augmentation, specialized architectures, and tailored training procedures -- can alleviate these issues, they depend on the availability and reliability of domain knowledge. When such knowledge is unavailable, misleading, or improperly applied, performance may deteriorate. In response, we introduce a novel, domain-agnostic, add-on, and data-driven strategy inspired by image stacking in image denoising. Termed ``semantic stacking,'' our method estimates a denoised semantic representation that complements the conventional segmentation loss during training. This method does not depend on domain-specific assumptions, making it broadly applicable across diverse image modalities, model architectures, and augmentation techniques. Through extensive experiments, we validate the superiority of our approach in improving segmentation performance under diverse conditions. Code is available at https://github.com/ymp5078/Semantic-Stacking.
