CFTrack: Enhancing Lightweight Visual Tracking through Contrastive Learning and Feature Matching
Juntao Liang, Jun Hou, Weijun Zhang, Yong Wang
TL;DR
CFTrack addresses the challenge of achieving high discriminative accuracy in lightweight visual tracking on resource-constrained devices. It introduces a Contrastive Feature Matching module with an adaptive margin and adaptive contrastive loss, integrated into a Siamese backbone to enhance target-background separation and temporal consistency during inference. Extensive experiments on LaSOT, OTB100, UAV123, and HOOT show CFTrack achieves strong accuracy while running at real-time speeds (e.g., 136 fps on Jetson NX and 368 fps on a 3090), outperforming many lightweight trackers. The work demonstrates the practical impact of combining online contrastive learning with feature matching for robust, efficient tracking in edge environments, and suggests a general, plug-and-play potential for similar trackers.
Abstract
Achieving both efficiency and strong discriminative ability in lightweight visual tracking is a challenge, especially on mobile and edge devices with limited computational resources. Conventional lightweight trackers often struggle with robustness under occlusion and interference, while deep trackers, when compressed to meet resource constraints, suffer from performance degradation. To address these issues, we introduce CFTrack, a lightweight tracker that integrates contrastive learning and feature matching to enhance discriminative feature representations. CFTrack dynamically assesses target similarity during prediction through a novel contrastive feature matching module optimized with an adaptive contrastive loss, thereby improving tracking accuracy. Extensive experiments on LaSOT, OTB100, and UAV123 show that CFTrack surpasses many state-of-the-art lightweight trackers, operating at 136 frames per second on the NVIDIA Jetson NX platform. Results on the HOOT dataset further demonstrate CFTrack's strong discriminative ability under heavy occlusion.
