A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

Yuki Minase; Kanji Tanaka

A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

Yuki Minase, Kanji Tanaka

Abstract

Robust person tracking is a critical capability for autonomous mobile robots operating in diverse and unpredictable environments. While RGB-D tracking has shown high precision, its performance severely degrades under challenging illumination conditions, such as total darkness or intense backlighting. To achieve all-weather robustness, this paper proposes a novel Thermal-Infrared and Depth (TIR-D) tracking architecture that leverages the standard sensor suite of SLAM-capable robots, namely LiDAR and TIR cameras. A major challenge in TIR-D tracking is the scarcity of annotated multi-modal datasets. To address this, we introduce a sequential knowledge transfer strategy that evolves structural priors from a large-scale thermal-trained model into the TIR-D domain. By employing a differential learning rate strategy -- referred to as ``Fine-grained Differential Learning Rate Strategy'' -- we effectively preserve pre-trained feature extraction capabilities while enabling rapid adaptation to geometric depth cues. Experimental results demonstrate that our proposed TIR-D tracker achieves superior performance, with an Average Overlap (AO) of 0.700 and a Success Rate (SR) of 58.7\%, significantly outperforming conventional RGB-transfer and single-modality baselines. Our approach provides a practical and resource-efficient solution for robust human-following in all-weather robotics applications.

A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

Abstract

A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

Abstract

Paper Structure

Table of Contents

Figures (4)