SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation

Yanqi Ge; Ye Huang; Wen Li; Lixin Duan

SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation

Yanqi Ge, Ye Huang, Wen Li, Lixin Duan

TL;DR

SSR tackles domain shift in semantic segmentation by leveraging the Segment-Anything (SAM) model as a training-time regularizer. It introduces a two-branch architecture, with a training-only regularization branch using a frozen SAM and a shadow inference branch that shares the backbone and decoder, ensuring zero additional inference cost. Cross-attention between MiT-B5 backbone features and SAM features aligns representations across four stages, improving robustness on the GTA5→Cityscapes benchmark and yielding consistent gains for both DAFormer and MIC baselines. The results demonstrate that foundation-model pretraining can enhance domain generalization in segmentation without sacrificing runtime efficiency, offering a practical pathway for robust domain adaptation.

Abstract

We introduced SSR, which utilizes SAM (segment-anything) as a strong regularizer during training, to greatly enhance the robustness of the image encoder for handling various domains. Specifically, given the fact that SAM is pre-trained with a large number of images over the internet, which cover a diverse variety of domains, the feature encoding extracted by the SAM is obviously less dependent on specific domains when compared to the traditional ImageNet pre-trained image encoder. Meanwhile, the ImageNet pre-trained image encoder is still a mature choice of backbone for the semantic segmentation task, especially when the SAM is category-irrelevant. As a result, our SSR provides a simple yet highly effective design. It uses the ImageNet pre-trained image encoder as the backbone, and the intermediate feature of each stage (ie there are 4 stages in MiT-B5) is regularized by SAM during training. After extensive experimentation on GTA5$\rightarrow$Cityscapes, our SSR significantly improved performance over the baseline without introducing any extra inference overhead.

SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation

TL;DR

Abstract

Cityscapes, our SSR significantly improved performance over the baseline without introducing any extra inference overhead.

Paper Structure (10 sections, 2 figures, 2 tables)

This paper contains 10 sections, 2 figures, 2 tables.

Introduction
Proposed methods
Regularization branch
Shadow branch
Training details
Experiments
Ablation studies
Compare with baselines
Visualization
Conclusion

Figures (2)

Figure 1: The architecture we propose for our SSR (SAM is a Strong Regularizer) consists of two branches. The Regularization Branch, which is used only during training, includes the frozen SAM. Additionally, we have a simple encoder-decoder shadow branch that is utilized in both the training and inference branches. Zoom in to see better.
Figure 2: Comparison of DAFormer vs DAFormer + SSR on Cityscapes dataset.

SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation

TL;DR

Abstract

SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)