Table of Contents
Fetching ...

Semantic Self-adaptation: Enhancing Generalization with a Single Sample

Sherwin Bahmani, Oliver Hahn, Eduard Zamfir, Nikita Araslanov, Daniel Cremers, Stefan Roth

TL;DR

This work tackles the challenge of out-of-domain generalization in semantic segmentation by introducing self-adaptive inference that customizes parameters for each test sample. It couples a pseudo-label-based test-time fine-tuning of convolutional layers with a novel SaN scheme that blends source BN statistics with single-sample statistics, enabling robust normalization at inference. Across synthetic-to-real benchmarks and diverse backbones, the approach achieves state-of-the-art generalization, improved calibration, and favorable runtime-accuracy trade-offs compared to standard test-time augmentations and prior domain-generalization methods. The study also establishes a principled, multi-domain evaluation protocol and demonstrates broad applicability, with implications for improving robustness in real-world segmentation tasks.

Abstract

The lack of out-of-domain generalization is a critical weakness of deep networks for semantic segmentation. Previous studies relied on the assumption of a static model, i. e., once the training process is complete, model parameters remain fixed at test time. In this work, we challenge this premise with a self-adaptive approach for semantic segmentation that adjusts the inference process to each input sample. Self-adaptation operates on two levels. First, it fine-tunes the parameters of convolutional layers to the input image using consistency regularization. Second, in Batch Normalization layers, self-adaptation interpolates between the training and the reference distribution derived from a single test sample. Despite both techniques being well known in the literature, their combination sets new state-of-the-art accuracy on synthetic-to-real generalization benchmarks. Our empirical study suggests that self-adaptation may complement the established practice of model regularization at training time for improving deep network generalization to out-of-domain data. Our code and pre-trained models are available at https://github.com/visinf/self-adaptive.

Semantic Self-adaptation: Enhancing Generalization with a Single Sample

TL;DR

This work tackles the challenge of out-of-domain generalization in semantic segmentation by introducing self-adaptive inference that customizes parameters for each test sample. It couples a pseudo-label-based test-time fine-tuning of convolutional layers with a novel SaN scheme that blends source BN statistics with single-sample statistics, enabling robust normalization at inference. Across synthetic-to-real benchmarks and diverse backbones, the approach achieves state-of-the-art generalization, improved calibration, and favorable runtime-accuracy trade-offs compared to standard test-time augmentations and prior domain-generalization methods. The study also establishes a principled, multi-domain evaluation protocol and demonstrates broad applicability, with implications for improving robustness in real-world segmentation tasks.

Abstract

The lack of out-of-domain generalization is a critical weakness of deep networks for semantic segmentation. Previous studies relied on the assumption of a static model, i. e., once the training process is complete, model parameters remain fixed at test time. In this work, we challenge this premise with a self-adaptive approach for semantic segmentation that adjusts the inference process to each input sample. Self-adaptation operates on two levels. First, it fine-tunes the parameters of convolutional layers to the input image using consistency regularization. Second, in Batch Normalization layers, self-adaptation interpolates between the training and the reference distribution derived from a single test sample. Despite both techniques being well known in the literature, their combination sets new state-of-the-art accuracy on synthetic-to-real generalization benchmarks. Our empirical study suggests that self-adaptation may complement the established practice of model regularization at training time for improving deep network generalization to out-of-domain data. Our code and pre-trained models are available at https://github.com/visinf/self-adaptive.
Paper Structure (21 sections, 8 equations, 11 figures, 13 tables, 1 algorithm)

This paper contains 21 sections, 8 equations, 11 figures, 13 tables, 1 algorithm.

Figures (11)

  • Figure 1: Overview of the one-sample adaptation process. We augment a single test sample by creating a batch of images at multiple scales, each with horizontal flipping and grayscaling. To transform the output of each version back to the original image plane, we apply the corresponding inverse affine transformation to every prediction. After averaging the softmax probabilities, we create a pseudo-label using a class-dependent confidence threshold. We update the model parameters by minimizing the cross-entropy loss with respect to the pseudo-label, repeating this process for a small number of iterations ($N_t$) before producing the final prediction. The updated model is then discarded.
  • Figure 2: Runtime-accuracy comparison on GTA $\rightarrow$ Cityscapes generalization using one NVIDIA GeForce RTX 2080 GPU. The curves trace self-adaptation iterations, i.e., the first point corresponds to $N_t=1$, while the last shows $N_t=10$. Self-adaptation balances accuracy and inference time by adjusting iteration numbers and layer choices, and is more cost-effective than 10 network ensembles.
  • Figure 3: Hyperparameter sensitivity on GTA $\rightarrow$ Cityscapes generalization. We investigate our hyperparameters $\alpha$, threshold $\psi$ and learning rate $\eta$ and report scores using Deeplabv1 with the ResNet-50 backbone trained on GTA. For self-adaptation, we fix one hyperparameter ($\psi$, $\eta$) while varying the other.
  • Figure 4: Qualitative semantic segmentation results for the generalization from GTA to Cityscapes, BDD, Mapillary, and IDD for the ResNet-50 backbone. We show the input image (top row), ground truth and the predictions of the baseline model and of our proposed self-adaptation (bottom row).
  • Figure 5: Mean IoU (%, $\uparrow$) using SaN based on the optimal alpha on the development set (WildDash). We report scores for the target domains (Cityscapes, BDD, IDD) for the ResNet-50 backbone after training on GTA (left) and SYNTHIA (right).
  • ...and 6 more figures