Table of Contents
Fetching ...

S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang

TL;DR

Robust semantic segmentation in medical imaging is hampered by limited and heterogeneous data. The authors introduce Semantic Stacking for Semantic Segmentation (S2S2), a domain-agnostic add-on that uses a semantic stack of synthetically generated images conditioned on segmentation maps to denoise the model's semantic representations, enforced via a semantic-consistency loss $\mathcal{L}_{sc}$. They derive a practical two-sample objective based on Bayesian updating, enabling efficient training with two samples per iteration, and implement the stack using a fine-tuned Stable Diffusion 2.x model with ControlNet. Across RGB, CT, and MRI datasets and multiple architectures, S2S2 improves both in-domain and out-of-domain performance, and ablations highlight encoder-level consistency as the main contributor while remaining compatible with domain-specific augmentations as a complement. This data-driven, modality-agnostic approach offers a versatile path to robust medical image segmentation without relying on domain-specific priors.

Abstract

Robustness and generalizability in medical image segmentation are often hindered by scarcity and limited diversity of training data, which stands in contrast to the variability encountered during inference. While conventional strategies -- such as domain-specific augmentation, specialized architectures, and tailored training procedures -- can alleviate these issues, they depend on the availability and reliability of domain knowledge. When such knowledge is unavailable, misleading, or improperly applied, performance may deteriorate. In response, we introduce a novel, domain-agnostic, add-on, and data-driven strategy inspired by image stacking in image denoising. Termed ``semantic stacking,'' our method estimates a denoised semantic representation that complements the conventional segmentation loss during training. This method does not depend on domain-specific assumptions, making it broadly applicable across diverse image modalities, model architectures, and augmentation techniques. Through extensive experiments, we validate the superiority of our approach in improving segmentation performance under diverse conditions. Code is available at https://github.com/ymp5078/Semantic-Stacking.

S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

TL;DR

Robust semantic segmentation in medical imaging is hampered by limited and heterogeneous data. The authors introduce Semantic Stacking for Semantic Segmentation (S2S2), a domain-agnostic add-on that uses a semantic stack of synthetically generated images conditioned on segmentation maps to denoise the model's semantic representations, enforced via a semantic-consistency loss . They derive a practical two-sample objective based on Bayesian updating, enabling efficient training with two samples per iteration, and implement the stack using a fine-tuned Stable Diffusion 2.x model with ControlNet. Across RGB, CT, and MRI datasets and multiple architectures, S2S2 improves both in-domain and out-of-domain performance, and ablations highlight encoder-level consistency as the main contributor while remaining compatible with domain-specific augmentations as a complement. This data-driven, modality-agnostic approach offers a versatile path to robust medical image segmentation without relying on domain-specific priors.

Abstract

Robustness and generalizability in medical image segmentation are often hindered by scarcity and limited diversity of training data, which stands in contrast to the variability encountered during inference. While conventional strategies -- such as domain-specific augmentation, specialized architectures, and tailored training procedures -- can alleviate these issues, they depend on the availability and reliability of domain knowledge. When such knowledge is unavailable, misleading, or improperly applied, performance may deteriorate. In response, we introduce a novel, domain-agnostic, add-on, and data-driven strategy inspired by image stacking in image denoising. Termed ``semantic stacking,'' our method estimates a denoised semantic representation that complements the conventional segmentation loss during training. This method does not depend on domain-specific assumptions, making it broadly applicable across diverse image modalities, model architectures, and augmentation techniques. Through extensive experiments, we validate the superiority of our approach in improving segmentation performance under diverse conditions. Code is available at https://github.com/ymp5078/Semantic-Stacking.

Paper Structure

This paper contains 23 sections, 6 equations, 8 figures, 14 tables.

Figures (8)

  • Figure 1: An illustration of the proposed semantic stacking approach compared to traditional image stacking for noise reduction. (a) Image stacking for noise reduction in imagery. (b) Our semantic stacking technique, aimed at reducing feature noise. Here, we illustrate semantic features through semantic segmentation maps for clarity, though our method operates on encoded features.
  • Figure 2: Illustration of the proposed S2S2 framework. A stack of images given is generated from the ground truth semantic segmentation map. Two samples from the stack are then fed into the network, where the training process is guided by the consistency between features alongside the segmentation loss.
  • Figure 3: Visualization of the improvement achieved by applying S2S2 to the base method in the in-domain setting. 'GT' is the ground truth. 'Base' refers to the corresponding method without S2S2.
  • Figure 4: Visualization of the improvement achieved by applying S2S2 to the base method in the out-of-domain setting. 'GT' is the ground truth. 'Base' refers to the corresponding method without S2S2.
  • Figure 5: Ablation study results using FCBFormer with the proposed S2S2 method. Dashed lines indicate the performance of the base method.
  • ...and 3 more figures