Table of Contents
Fetching ...

Progressive Domain Adaptation for Thermal Infrared Object Tracking

Qiao Li, Kanlun Tan, Qiao Liu, Di Yuan, Xin Li, Yunpeng Liu

TL;DR

PDAT tackles the domain gap between RGB-trained trackers and Thermal Infrared data by transferring RGB priors through a progressive domain adaptation pipeline. It combines SAM-based pseudo-labeling, adversarial global domain alignment, and clustering-based subdomain alignment to learn domain-invariant features using a large unlabeled TIR dataset. Empirical results across five TIR benchmarks show roughly 6% improvements in tracking precision and success rate over strong RGB-based baselines, with robust performance across diverse scenarios. This approach enables effective TIR tracking without collecting large-scale labeled TIR data, offering practical benefits for nighttime robotics and surveillance applications.

Abstract

Due to the lack of large-scale labeled Thermal InfraRed (TIR) training datasets, most existing TIR trackers are trained directly on RGB datasets. However, tracking methods trained on RGB datasets suffer a significant drop-off in TIR data due to the domain shift issue. To this end, in this work, we propose a Progressive Domain Adaptation framework for TIR Tracking (PDAT), which transfers useful knowledge learned from RGB tracking to TIR tracking. The framework makes full use of large-scale labeled RGB datasets without requiring time-consuming and labor-intensive labeling of large-scale TIR data. Specifically, we first propose an adversarial-based global domain adaptation module to reduce domain gap on the feature level coarsely. Second, we design a clustering-based subdomain adaptation method to further align the feature distributions of the RGB and TIR datasets finely. These two domain adaptation modules gradually eliminate the discrepancy between the two domains, and thus learn domain-invariant fine-grained features through progressive training. Additionally, we collect a largescale TIR dataset with over 1.48 million unlabeled TIR images for training the proposed domain adaptation framework. Experimental results on five TIR tracking benchmarks show that the proposed method gains a nearly 6% success rate, demonstrating its effectiveness.

Progressive Domain Adaptation for Thermal Infrared Object Tracking

TL;DR

PDAT tackles the domain gap between RGB-trained trackers and Thermal Infrared data by transferring RGB priors through a progressive domain adaptation pipeline. It combines SAM-based pseudo-labeling, adversarial global domain alignment, and clustering-based subdomain alignment to learn domain-invariant features using a large unlabeled TIR dataset. Empirical results across five TIR benchmarks show roughly 6% improvements in tracking precision and success rate over strong RGB-based baselines, with robust performance across diverse scenarios. This approach enables effective TIR tracking without collecting large-scale labeled TIR data, offering practical benefits for nighttime robotics and surveillance applications.

Abstract

Due to the lack of large-scale labeled Thermal InfraRed (TIR) training datasets, most existing TIR trackers are trained directly on RGB datasets. However, tracking methods trained on RGB datasets suffer a significant drop-off in TIR data due to the domain shift issue. To this end, in this work, we propose a Progressive Domain Adaptation framework for TIR Tracking (PDAT), which transfers useful knowledge learned from RGB tracking to TIR tracking. The framework makes full use of large-scale labeled RGB datasets without requiring time-consuming and labor-intensive labeling of large-scale TIR data. Specifically, we first propose an adversarial-based global domain adaptation module to reduce domain gap on the feature level coarsely. Second, we design a clustering-based subdomain adaptation method to further align the feature distributions of the RGB and TIR datasets finely. These two domain adaptation modules gradually eliminate the discrepancy between the two domains, and thus learn domain-invariant fine-grained features through progressive training. Additionally, we collect a largescale TIR dataset with over 1.48 million unlabeled TIR images for training the proposed domain adaptation framework. Experimental results on five TIR tracking benchmarks show that the proposed method gains a nearly 6% success rate, demonstrating its effectiveness.
Paper Structure (15 sections, 10 equations, 8 figures, 7 tables)

This paper contains 15 sections, 10 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Visualization of the feature distribution of the baseline and the proposed progressive domain adaptation model based on t-SNE. Numbers ① and ② represent the TIR and RGB domain samples, respectively. It shows that the distance between the feature distributions of similar samples in different domains extracted by the baseline feature extractor is very large. While after using our global domain adaptation and subdomain adaptation modules, the obtained feature distributions are gradually narrowed.
  • Figure 2: Proposed progressive domain adaptation TIR tracking framework (PDAT) which mainly consists of three parts: Segment Anything Model (SAM) based data preprocessing, Adversarial-based Global Domain Adaptation (AGDA), and Clustering-based SubDomain Adaptation (CSDA). SAM is used to generate a large number of pseudo-labeled TIR training data likes source samples. AGDA aligns global domain coarsely, while CSDA further aligns subdomains finely. Stage 1 to Stage 4 denote the feature extraction block of the backbone.
  • Figure 3: Comparison of results obtained by three pseudo-label generation preprocessing methods. It can be seen that the SAM-based method obtains more sample pairs with higher diversity.
  • Figure 4: Qualitative comparison between the proposed method (PDAT-CAR) and several state-of-the-art trackers on the similar distractor and background clutter challenges of LSOTB-TIR100.
  • Figure 5: Comparison of the confidence maps generated by the proposed PDAT-CAR and the baseline SiamCAR on several challenging sequences of LSOTB-TIR120. The green bounding box represents the groundtruth of the target. PDAT-CAR significantly reduces interference from the background clutter and similar distractor.
  • ...and 3 more figures