Table of Contents
Fetching ...

D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection

Dinh Phat Do, Taehoon Kim, Jaemin Na, Jiwon Kim, Keonho Lee, Kyunghwan Cho, Wonjun Hwang

TL;DR

The paper tackles the challenging problem of unsupervised domain adaptation for object detection from RGB to thermal imagery, where the domain gap is substantial and labeled thermal data are scarce. It introduces Distinctive Dual-Domain Teacher (D3T), a Mean Teacher-based framework with two domain-specific teachers (RGB and thermal) and a zigzag learning schedule that progressively transfers knowledge from RGB to the thermal domain. The method combines ground-truth supervision on RGB with cross-domain pseudo-labels from both teachers, using EMA to update teachers and a dynamic training cadence to minimize negative transfer. Empirical results on FLIR and KAIST show that D3T substantially outperforms prior approaches, with ablations demonstrating the incremental benefits of dual teachers, zigzag learning, and knowledge incorporation, highlighting its practical potential for reliable thermal-object detection in challenging environments.

Abstract

Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situation. To overcome this challenge, we propose a Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training paradigms for each domain. Specifically, we segregate the source and target training sets for building dual-teachers and successively deploy exponential moving average to the student model to individual teachers of each domain. The framework further incorporates a zigzag learning method between dual teachers, facilitating a gradual transition from the visible to thermal domains during training. We validate the superiority of our method through newly designed experimental protocols with well-known thermal datasets, i.e., FLIR and KAIST. Source code is available at https://github.com/EdwardDo69/D3T .

D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection

TL;DR

The paper tackles the challenging problem of unsupervised domain adaptation for object detection from RGB to thermal imagery, where the domain gap is substantial and labeled thermal data are scarce. It introduces Distinctive Dual-Domain Teacher (D3T), a Mean Teacher-based framework with two domain-specific teachers (RGB and thermal) and a zigzag learning schedule that progressively transfers knowledge from RGB to the thermal domain. The method combines ground-truth supervision on RGB with cross-domain pseudo-labels from both teachers, using EMA to update teachers and a dynamic training cadence to minimize negative transfer. Empirical results on FLIR and KAIST show that D3T substantially outperforms prior approaches, with ablations demonstrating the incremental benefits of dual teachers, zigzag learning, and knowledge incorporation, highlighting its practical potential for reliable thermal-object detection in challenging environments.

Abstract

Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situation. To overcome this challenge, we propose a Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training paradigms for each domain. Specifically, we segregate the source and target training sets for building dual-teachers and successively deploy exponential moving average to the student model to individual teachers of each domain. The framework further incorporates a zigzag learning method between dual teachers, facilitating a gradual transition from the visible to thermal domains during training. We validate the superiority of our method through newly designed experimental protocols with well-known thermal datasets, i.e., FLIR and KAIST. Source code is available at https://github.com/EdwardDo69/D3T .
Paper Structure (16 sections, 8 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 16 sections, 8 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) Sample images showing the difference between unsupervised domain adaptation from RGB to RGB domains and unsupervised domain adaptation from RGB to thermal domains. (b) Conceptual illustration of the proposed unsupervised domain adaptation using distinctive dual-domain teachers, demonstrating the zigzag approach across the large RGB-thermal gap.
  • Figure 2: Overview of D3T: Our D3T model consists of two stages. Burn-in Stage: We initiate the training of the object detector using labeled data from the RGB domain. Zigzag Learning Stage: Comprises two distinct and interleaved training components for the Thermal domain and the RGB domain, respectively. During each step of training, the student model utilizes images from a single domain for training but leverages knowledge from two teachers for enhanced learning effectiveness. In each step, only one teacher model is updated corresponding to the trained domain.
  • Figure 3: Dual-teachers' pseudo-labels at different training stages. (a) and (b) are pseudo-labels from the RGB and thermal teacher models in early training stages, respectively, while (c) and (d) are pseudo-labels from the same models in later training stages.
  • Figure 4: Visualization of UDA results for object detection models: Source only, EPM hsu2020every, our D3T, and ground truth labels in the FLIR dataset RGB $\rightarrow$ thermal domain. The green and red boxes represent the classes of person and car.
  • Figure 5: Visualization of UDA results for object detection models: Source only, EPM hsu2020every, our D3T, and ground truth labels in the KAIST dataset RGB $\rightarrow$ thermal domain. The green boxes represent the classes of person.
  • ...and 1 more figures