Table of Contents
Fetching ...

SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across Domains

Qingmei Li, Yang Zhang, Peifeng Zhang, Haohuan Fu, Juepeng Zheng

TL;DR

This work tackles privacy-constrained domain generalization for semantic segmentation by freezing the target model and introducing SAGE, which uses style-driven input prompts to bridge domain gaps. It generates multiple style-prompts from stylized source data and dynamically fuses them per input via cross-attention, enabling robust cross-domain performance without touching model parameters. The approach achieves competitive or superior results on five benchmarks with drastically fewer trainable parameters and demonstrates clear benefits from adaptive prompt design and fusion. These findings highlight a practical path for deploying robust, privacy-preserving segmentation systems across diverse urban environments.

Abstract

Domain generalization for semantic segmentation aims to mitigate the degradation in model performance caused by domain shifts. However, in many real-world scenarios, we are unable to access the model parameters and architectural details due to privacy concerns and security constraints. Traditional fine-tuning or adaptation is hindered, leading to the demand for input-level strategies that can enhance generalization without modifying model weights. To this end, we propose a \textbf{S}tyle-\textbf{A}daptive \textbf{GE}neralization framework (\textbf{SAGE}), which improves the generalization of frozen models under privacy constraints. SAGE learns to synthesize visual prompts that implicitly align feature distributions across styles instead of directly fine-tuning the backbone. Specifically, we first utilize style transfer to construct a diverse style representation of the source domain, thereby learning a set of style characteristics that can cover a wide range of visual features. Then, the model adaptively fuses these style cues according to the visual context of each input, forming a dynamic prompt that harmonizes the image appearance without touching the interior of the model. Through this closed-loop design, SAGE effectively bridges the gap between frozen model invariance and the diversity of unseen domains. Extensive experiments on five benchmark datasets demonstrate that SAGE achieves competitive or superior performance compared to state-of-the-art methods under privacy constraints and outperforms full fine-tuning baselines in all settings.

SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across Domains

TL;DR

This work tackles privacy-constrained domain generalization for semantic segmentation by freezing the target model and introducing SAGE, which uses style-driven input prompts to bridge domain gaps. It generates multiple style-prompts from stylized source data and dynamically fuses them per input via cross-attention, enabling robust cross-domain performance without touching model parameters. The approach achieves competitive or superior results on five benchmarks with drastically fewer trainable parameters and demonstrates clear benefits from adaptive prompt design and fusion. These findings highlight a practical path for deploying robust, privacy-preserving segmentation systems across diverse urban environments.

Abstract

Domain generalization for semantic segmentation aims to mitigate the degradation in model performance caused by domain shifts. However, in many real-world scenarios, we are unable to access the model parameters and architectural details due to privacy concerns and security constraints. Traditional fine-tuning or adaptation is hindered, leading to the demand for input-level strategies that can enhance generalization without modifying model weights. To this end, we propose a \textbf{S}tyle-\textbf{A}daptive \textbf{GE}neralization framework (\textbf{SAGE}), which improves the generalization of frozen models under privacy constraints. SAGE learns to synthesize visual prompts that implicitly align feature distributions across styles instead of directly fine-tuning the backbone. Specifically, we first utilize style transfer to construct a diverse style representation of the source domain, thereby learning a set of style characteristics that can cover a wide range of visual features. Then, the model adaptively fuses these style cues according to the visual context of each input, forming a dynamic prompt that harmonizes the image appearance without touching the interior of the model. Through this closed-loop design, SAGE effectively bridges the gap between frozen model invariance and the diversity of unseen domains. Extensive experiments on five benchmark datasets demonstrate that SAGE achieves competitive or superior performance compared to state-of-the-art methods under privacy constraints and outperforms full fine-tuning baselines in all settings.

Paper Structure

This paper contains 34 sections, 32 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of domain generalization settings under privacy constraints. (a) Standard DGSS requires backbone tuning. (b) Style-fixed prompting is ineffective on unseen domains. (c) SAGE inference overcomes privacy limitations.
  • Figure 2: The training process of SPG: each style reference guides the source image through a dedicated generator $G_i$ to produce a style-aware prompt. Prompts are merged and used to optimize a frozen privacy model. Each style-prompt generator consists of a modulation network and a learnable prompt template, producing the final style-prompt via element-wise modulation.
  • Figure 3: The overview of APF.
  • Figure 4: Ablation study on prompt design. (a) and (b) show the mIoU performance comparison between different prompt generator types on the G $\rightarrow$ {C, B, M, S} and S $\rightarrow$ {C, B, M, G} tasks, respectively. (c) and (d) evaluate the impact of different prompt template initialization strategies.
  • Figure 5: Attention weight distribution over four style-prompts (lakeside, pier, valley, volcano) for images from different target domains (BDD-100K, Cityscapes, GTAV, Mapillary). The results indicate that each domain exhibits distinct style preferences.
  • ...and 3 more figures