TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection
Xue Zhang, Xiaohan Zhang, Jiangtao Wang, Jiacheng Ying, Zehua Sheng, Heng Yu, Chunguang Li, Hui-Liang Shen
TL;DR
TFDet tackles the FP-prone issue in multispectral RGB-T pedestrian detection by introducing a target-aware fusion framework. It combines a two-stage fusion pipeline consisting of a Feature Fusion Module (FFM) that exploits parallel- and cross-channel multispectral similarities, and a Feature Refinement Module (FRM) that learns to distinguish target from background via a box-level mask and a correlation-maximum loss. The correlation-maximum loss jointly supervises a segmentation-based target mask and enforces high correlation between the predicted mask and the fused features, dramatically reducing false positives and boosting target contrast. Empirically, TFDet achieves state-of-the-art results on KAIST and LLVIP pedestrian benchmarks and extends to multi-class detection on FLIR and M3FD, while maintaining competitive inference speed, demonstrating practical impact for robust, low-light pedestrian detection in road safety scenarios.
Abstract
Pedestrian detection plays a critical role in computer vision as it contributes to ensuring traffic safety. Existing methods that rely solely on RGB images suffer from performance degradation under low-light conditions due to the lack of useful information. To address this issue, recent multispectral detection approaches have combined thermal images to provide complementary information and have obtained enhanced performances. Nevertheless, few approaches focus on the negative effects of false positives caused by noisy fused feature maps. Different from them, we comprehensively analyze the impacts of false positives on the detection performance and find that enhancing feature contrast can significantly reduce these false positives. In this paper, we propose a novel target-aware fusion strategy for multispectral pedestrian detection, named TFDet. TFDet achieves state-of-the-art performance on two multispectral pedestrian benchmarks, KAIST and LLVIP. TFDet can easily extend to multi-class object detection scenarios. It outperforms the previous best approaches on two multispectral object detection benchmarks, FLIR and M3FD. Importantly, TFDet has comparable inference efficiency to the previous approaches, and has remarkably good detection performance even under low-light conditions, which is a significant advancement for ensuring road safety.
