Table of Contents
Fetching ...

A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

Yuki Minase, Kanji Tanaka

Abstract

Robust person tracking is a critical capability for autonomous mobile robots operating in diverse and unpredictable environments. While RGB-D tracking has shown high precision, its performance severely degrades under challenging illumination conditions, such as total darkness or intense backlighting. To achieve all-weather robustness, this paper proposes a novel Thermal-Infrared and Depth (TIR-D) tracking architecture that leverages the standard sensor suite of SLAM-capable robots, namely LiDAR and TIR cameras. A major challenge in TIR-D tracking is the scarcity of annotated multi-modal datasets. To address this, we introduce a sequential knowledge transfer strategy that evolves structural priors from a large-scale thermal-trained model into the TIR-D domain. By employing a differential learning rate strategy -- referred to as ``Fine-grained Differential Learning Rate Strategy'' -- we effectively preserve pre-trained feature extraction capabilities while enabling rapid adaptation to geometric depth cues. Experimental results demonstrate that our proposed TIR-D tracker achieves superior performance, with an Average Overlap (AO) of 0.700 and a Success Rate (SR) of 58.7\%, significantly outperforming conventional RGB-transfer and single-modality baselines. Our approach provides a practical and resource-efficient solution for robust human-following in all-weather robotics applications.

A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

Abstract

Robust person tracking is a critical capability for autonomous mobile robots operating in diverse and unpredictable environments. While RGB-D tracking has shown high precision, its performance severely degrades under challenging illumination conditions, such as total darkness or intense backlighting. To achieve all-weather robustness, this paper proposes a novel Thermal-Infrared and Depth (TIR-D) tracking architecture that leverages the standard sensor suite of SLAM-capable robots, namely LiDAR and TIR cameras. A major challenge in TIR-D tracking is the scarcity of annotated multi-modal datasets. To address this, we introduce a sequential knowledge transfer strategy that evolves structural priors from a large-scale thermal-trained model into the TIR-D domain. By employing a differential learning rate strategy -- referred to as ``Fine-grained Differential Learning Rate Strategy'' -- we effectively preserve pre-trained feature extraction capabilities while enabling rapid adaptation to geometric depth cues. Experimental results demonstrate that our proposed TIR-D tracker achieves superior performance, with an Average Overlap (AO) of 0.700 and a Success Rate (SR) of 58.7\%, significantly outperforming conventional RGB-transfer and single-modality baselines. Our approach provides a practical and resource-efficient solution for robust human-following in all-weather robotics applications.

Paper Structure

This paper contains 25 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 3: Examples of multi-modal input pair. The TIR image (left) captures the heat signature of the person, while the projected LiDAR depth map (right) provides geometric information. Both are processed through the Thermal-Depth Adaptation Layer.
  • Figure 4: Detailed structure of the Thermal-Depth Adaptation Layer.
  • Figure 5: Experimental setup for TIR-D person tracking. (a) displays the target environment (a classroom) (b) shows the HIKMICRO Pocket2 thermal camera used for TIR-D data acquisition and the integrated LiDAR-camera system used for generating aligned depth maps.
  • Figure 6: Qualitative results of the proposed TIR-D tracker in various indoor scenarios. The tracker maintains robust bounding boxes (represented by the TIR signatures) despite changes in person distance, orientation, and overlapping heat signatures from furniture or other sources.