Weakly Supervised Test-Time Domain Adaptation for Object Detection
Anh-Dzung Doan, Bach Long Nguyen, Terry Lim, Madhuka Jayawardhana, Surabhi Gupta, Christophe Guettier, Ian Reid, Markus Wagner, Tat-Jun Chin
TL;DR
This work tackles the challenge of deploying object detectors in evolving outdoor environments where distribution shifts degrade performance. It introduces Weakly Supervised Test-Time Adaptation (WSTTA), a human-in-the-loop method that uses weak labels provided by an operator to guide online, streaming adaptation of a Faster-RCNN detector, including updating BatchNorm layers and generating pseudo-labels. WSTTA combines pseudo-labelling, image-level recognition, and instance-level recognition within a unified loss to adapt in real time, and demonstrates superior performance over fully autonomous TTA baselines on benchmarks like KITTI→KITTI-Fog, Cityscapes→KITTI, and Visible→Infrared, with the added benefit of reduced labeling effort and privacy considerations. The approach shows strong potential for surveillance scenarios where a human supervisor can efficiently improve detection accuracy without full labeling, and it opens avenues for exploring when to adapt, alternative domain-gap minimisation techniques, and better human–AI interaction interfaces in test-time settings.
Abstract
Prior to deployment, an object detector is trained on a dataset compiled from a previous data collection campaign. However, the environment in which the object detector is deployed will invariably evolve, particularly in outdoor settings where changes in lighting, weather and seasons will significantly affect the appearance of the scene and target objects. It is almost impossible for all potential scenarios that the object detector may come across to be present in a finite training dataset. This necessitates continuous updates to the object detector to maintain satisfactory performance. Test-time domain adaptation techniques enable machine learning models to self-adapt based on the distributions of the testing data. However, existing methods mainly focus on fully automated adaptation, which makes sense for applications such as self-driving cars. Despite the prevalence of fully automated approaches, in some applications such as surveillance, there is usually a human operator overseeing the system's operation. We propose to involve the operator in test-time domain adaptation to raise the performance of object detection beyond what is achievable by fully automated adaptation. To reduce manual effort, the proposed method only requires the operator to provide weak labels, which are then used to guide the adaptation process. Furthermore, the proposed method can be performed in a streaming setting, where each online sample is observed only once. We show that the proposed method outperforms existing works, demonstrating a great benefit of human-in-the-loop test-time domain adaptation. Our code is publicly available at https://github.com/dzungdoan6/WSTTA
