Table of Contents
Fetching ...

Weakly Supervised Test-Time Domain Adaptation for Object Detection

Anh-Dzung Doan, Bach Long Nguyen, Terry Lim, Madhuka Jayawardhana, Surabhi Gupta, Christophe Guettier, Ian Reid, Markus Wagner, Tat-Jun Chin

TL;DR

This work tackles the challenge of deploying object detectors in evolving outdoor environments where distribution shifts degrade performance. It introduces Weakly Supervised Test-Time Adaptation (WSTTA), a human-in-the-loop method that uses weak labels provided by an operator to guide online, streaming adaptation of a Faster-RCNN detector, including updating BatchNorm layers and generating pseudo-labels. WSTTA combines pseudo-labelling, image-level recognition, and instance-level recognition within a unified loss to adapt in real time, and demonstrates superior performance over fully autonomous TTA baselines on benchmarks like KITTI→KITTI-Fog, Cityscapes→KITTI, and Visible→Infrared, with the added benefit of reduced labeling effort and privacy considerations. The approach shows strong potential for surveillance scenarios where a human supervisor can efficiently improve detection accuracy without full labeling, and it opens avenues for exploring when to adapt, alternative domain-gap minimisation techniques, and better human–AI interaction interfaces in test-time settings.

Abstract

Prior to deployment, an object detector is trained on a dataset compiled from a previous data collection campaign. However, the environment in which the object detector is deployed will invariably evolve, particularly in outdoor settings where changes in lighting, weather and seasons will significantly affect the appearance of the scene and target objects. It is almost impossible for all potential scenarios that the object detector may come across to be present in a finite training dataset. This necessitates continuous updates to the object detector to maintain satisfactory performance. Test-time domain adaptation techniques enable machine learning models to self-adapt based on the distributions of the testing data. However, existing methods mainly focus on fully automated adaptation, which makes sense for applications such as self-driving cars. Despite the prevalence of fully automated approaches, in some applications such as surveillance, there is usually a human operator overseeing the system's operation. We propose to involve the operator in test-time domain adaptation to raise the performance of object detection beyond what is achievable by fully automated adaptation. To reduce manual effort, the proposed method only requires the operator to provide weak labels, which are then used to guide the adaptation process. Furthermore, the proposed method can be performed in a streaming setting, where each online sample is observed only once. We show that the proposed method outperforms existing works, demonstrating a great benefit of human-in-the-loop test-time domain adaptation. Our code is publicly available at https://github.com/dzungdoan6/WSTTA

Weakly Supervised Test-Time Domain Adaptation for Object Detection

TL;DR

This work tackles the challenge of deploying object detectors in evolving outdoor environments where distribution shifts degrade performance. It introduces Weakly Supervised Test-Time Adaptation (WSTTA), a human-in-the-loop method that uses weak labels provided by an operator to guide online, streaming adaptation of a Faster-RCNN detector, including updating BatchNorm layers and generating pseudo-labels. WSTTA combines pseudo-labelling, image-level recognition, and instance-level recognition within a unified loss to adapt in real time, and demonstrates superior performance over fully autonomous TTA baselines on benchmarks like KITTI→KITTI-Fog, Cityscapes→KITTI, and Visible→Infrared, with the added benefit of reduced labeling effort and privacy considerations. The approach shows strong potential for surveillance scenarios where a human supervisor can efficiently improve detection accuracy without full labeling, and it opens avenues for exploring when to adapt, alternative domain-gap minimisation techniques, and better human–AI interaction interfaces in test-time settings.

Abstract

Prior to deployment, an object detector is trained on a dataset compiled from a previous data collection campaign. However, the environment in which the object detector is deployed will invariably evolve, particularly in outdoor settings where changes in lighting, weather and seasons will significantly affect the appearance of the scene and target objects. It is almost impossible for all potential scenarios that the object detector may come across to be present in a finite training dataset. This necessitates continuous updates to the object detector to maintain satisfactory performance. Test-time domain adaptation techniques enable machine learning models to self-adapt based on the distributions of the testing data. However, existing methods mainly focus on fully automated adaptation, which makes sense for applications such as self-driving cars. Despite the prevalence of fully automated approaches, in some applications such as surveillance, there is usually a human operator overseeing the system's operation. We propose to involve the operator in test-time domain adaptation to raise the performance of object detection beyond what is achievable by fully automated adaptation. To reduce manual effort, the proposed method only requires the operator to provide weak labels, which are then used to guide the adaptation process. Furthermore, the proposed method can be performed in a streaming setting, where each online sample is observed only once. We show that the proposed method outperforms existing works, demonstrating a great benefit of human-in-the-loop test-time domain adaptation. Our code is publicly available at https://github.com/dzungdoan6/WSTTA
Paper Structure (26 sections, 9 equations, 7 figures, 2 tables)

This paper contains 26 sections, 9 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: (a) Previous works have focused on developing fully autonomous solutions, primarily for self-driving vehicles cottaduatent(b) Our approach, however, is proposed for visual surveillance, which are typically monitored by an operator. Therefore, our method will take advantage of the operator's involvement in the adaptation process. (c) The definitions of full and weak labels: A full label includes bounding boxes and object categories. A weak label only indicates which object categories are present in the image. By only requiring weak labels, our method reduces the amount of labour needed significantly.
  • Figure 2: For an incoming target testing sample $x_t$, WSTTA initially produces a prediction $\hat{y}_t$ and the operator is required to provide a weak label $z_t$ for it. Subsequently, using the prediction $\hat{y}_t$ and weak label $z_t$, a pseudo-label $y^\text{psd}_t$ is generated. Finally, the weak label $z_t$ and pseudo-label $y^\text{psd}_t$ are used as groundtruth for image-level recognition and instance-level recognition respectively.
  • Figure 2: Comparing AP50 within each object categories and mAP across all categories between WSTTA and other baselines (larger is better)
  • Figure 3: Sample images of our MSA-SYNTH dataset, where we simulate different environmental conditions.
  • Figure 4: Illustration of how noisy weak labels are simulated. Given a weak label {Car, Cyclist}, a corresponding multi-hot vector is created. If the noise ratio is 70%, the value 1 in "Car" element will have the 70% probability of being switched to 0, while having the 30% probability of remaining 1. A similar operation is applied to elements "Pedestrian" and "Cyclist".
  • ...and 2 more figures