Single Image Test-Time Adaptation for Segmentation
Klara Janouskova, Tamir Shor, Chaim Baskin, Jiri Matas
TL;DR
The paper investigates single-image Test-Time Adaptation (sitta) for semantic segmentation, aiming to adapt a pretrained model to an unlabeled target image at inference using self-supervised losses. It compares six TTA methods (including entropy minimization, pseudo-labelling, augmented-consistency, adversarial-invariance, and two novel mask-based approaches: mask-refinement and IoU-based refinement) and introduces an adversarially-driven mask refinement framework. The strongest results come from ttaref with IoU loss, achieving up to 3.51 percentage points improvement on GTA5-C and 3.28 points on COCO-C over non-adapted baselines, with performance highly dependent on hyper-parameter settings. The study highlights the potential of per-image, self-supervised adaptation for domains with strict data governance and heterogeneous domain shifts, while also revealing significant sensitivity to loss choice and update strategy, and calling for broader model and dataset evaluations.
Abstract
Test-Time Adaptation (TTA) methods improve the robustness of deep neural networks to domain shift on a variety of tasks such as image classification or segmentation. This work explores adapting segmentation models to a single unlabelled image with no other data available at test-time. In particular, this work focuses on adaptation by optimizing self-supervised losses at test-time. Multiple baselines based on different principles are evaluated under diverse conditions and a novel adversarial training is introduced for adaptation with mask refinement. Our additions to the baselines result in a 3.51 and 3.28 % increase over non-adapted baselines, without these improvements, the increase would be 1.7 and 2.16 % only.
