Table of Contents
Fetching ...

Single Image Test-Time Adaptation for Segmentation

Klara Janouskova, Tamir Shor, Chaim Baskin, Jiri Matas

TL;DR

The paper investigates single-image Test-Time Adaptation (sitta) for semantic segmentation, aiming to adapt a pretrained model to an unlabeled target image at inference using self-supervised losses. It compares six TTA methods (including entropy minimization, pseudo-labelling, augmented-consistency, adversarial-invariance, and two novel mask-based approaches: mask-refinement and IoU-based refinement) and introduces an adversarially-driven mask refinement framework. The strongest results come from ttaref with IoU loss, achieving up to 3.51 percentage points improvement on GTA5-C and 3.28 points on COCO-C over non-adapted baselines, with performance highly dependent on hyper-parameter settings. The study highlights the potential of per-image, self-supervised adaptation for domains with strict data governance and heterogeneous domain shifts, while also revealing significant sensitivity to loss choice and update strategy, and calling for broader model and dataset evaluations.

Abstract

Test-Time Adaptation (TTA) methods improve the robustness of deep neural networks to domain shift on a variety of tasks such as image classification or segmentation. This work explores adapting segmentation models to a single unlabelled image with no other data available at test-time. In particular, this work focuses on adaptation by optimizing self-supervised losses at test-time. Multiple baselines based on different principles are evaluated under diverse conditions and a novel adversarial training is introduced for adaptation with mask refinement. Our additions to the baselines result in a 3.51 and 3.28 % increase over non-adapted baselines, without these improvements, the increase would be 1.7 and 2.16 % only.

Single Image Test-Time Adaptation for Segmentation

TL;DR

The paper investigates single-image Test-Time Adaptation (sitta) for semantic segmentation, aiming to adapt a pretrained model to an unlabeled target image at inference using self-supervised losses. It compares six TTA methods (including entropy minimization, pseudo-labelling, augmented-consistency, adversarial-invariance, and two novel mask-based approaches: mask-refinement and IoU-based refinement) and introduces an adversarially-driven mask refinement framework. The strongest results come from ttaref with IoU loss, achieving up to 3.51 percentage points improvement on GTA5-C and 3.28 points on COCO-C over non-adapted baselines, with performance highly dependent on hyper-parameter settings. The study highlights the potential of per-image, self-supervised adaptation for domains with strict data governance and heterogeneous domain shifts, while also revealing significant sensitivity to loss choice and update strategy, and calling for broader model and dataset evaluations.

Abstract

Test-Time Adaptation (TTA) methods improve the robustness of deep neural networks to domain shift on a variety of tasks such as image classification or segmentation. This work explores adapting segmentation models to a single unlabelled image with no other data available at test-time. In particular, this work focuses on adaptation by optimizing self-supervised losses at test-time. Multiple baselines based on different principles are evaluated under diverse conditions and a novel adversarial training is introduced for adaptation with mask refinement. Our additions to the baselines result in a 3.51 and 3.28 % increase over non-adapted baselines, without these improvements, the increase would be 1.7 and 2.16 % only.
Paper Structure (18 sections, 10 equations, 21 figures, 6 tables)

This paper contains 18 sections, 10 equations, 21 figures, 6 tables.

Figures (21)

  • Figure 1: The proposed experimental framework for sitta. Hyper-parameters are found on a synthetic dataset derived from the training set by applying a diverse set of corruptions. sitta methods are then tested on real-world datasets with domain shift.
  • Figure 2: Mask refiner training (left) and ttaref tta inference (right). During training, the segmenter outputs masks from a training image and a corrupted version of the training image simulating domain shift. The mask refiner is then trained to predict the clean image mask given the corrupted image mask as input only - no gradients flow back to the segmenter. At inference time, the segmenter output is fed into the refiner model. The refined output is then used as a pseudo-label to finetune the segmenter. A single gradient update is performed in each tta iteration, then the masks are updated. The segmenter output may change with the updated weights, which in turn results in a new, possibly better, pseudo-label from the refiner. Visualized on single class prediction.
  • Figure 3: GTA5-C $\text{m}\overline{\text{IoU}}_i$ error reduction (%) depending on corruption levels. tta with overall optimal hyper-parameters for GTA5-C.
  • Figure 4: The relationship between per-image scores (a) or entropy (b) before and the score after adaptation on the GTA5-C dataset. The difference between non-adapted (NA) $\text{m}\overline{\text{IoU}}_i$ or entropy and the $\text{m}\overline{\text{IoU}}_i$ after tta is shown ($\text{m}\overline{\text{IoU}}_i \Delta$). A least-squares line fitted to the points is shown in yellow.
  • Figure 5: GTA5-C error reduction difference (%) between overall optimal hyperparameters and hyper-parameters selected for each corruption kind separately. The hyper-parameters were selected on GTA5-C.
  • ...and 16 more figures