HIPTrack: Visual Tracking with Historical Prompts
Wenrui Cai, Qingjie Liu, Yunhong Wang
TL;DR
HIPTrack tackles the challenge of visual tracking under appearance variations by introducing a historical prompt network that encodes refined historical foreground masks and target visuals into a memory bank and adaptively decodes prompts for the current search region. The tracker builds on a frozen Vision Transformer backbone and a light-weight encoder–decoder memory module to generate history-aware prompts without retraining the entire model, achieving state-of-the-art results on LaSOT, LaSOT_{ext}, GOT-10k, and NfS while running efficiently at reported FPS. The historical prompts function as a plug-and-play enhancement that improves robustness to occlusion, deformation, and scale variation, with ablations confirming the importance of memory size, update cadence, and the quality of the encoded history. Overall, HIPTrack demonstrates that precise, updated historical information, accessed via prompt learning and memory retrieval, can substantially boost Siamese-style trackers in real-world scenarios.
Abstract
Trackers that follow Siamese paradigm utilize similarity matching between template and search region features for tracking. Many methods have been explored to enhance tracking performance by incorporating tracking history to better handle scenarios involving target appearance variations such as deformation and occlusion. However, the utilization of historical information in existing methods is insufficient and incomprehensive, which typically requires repetitive training and introduces a large amount of computation. In this paper, we show that by providing a tracker that follows Siamese paradigm with precise and updated historical information, a significant performance improvement can be achieved with completely unchanged parameters. Based on this, we propose a historical prompt network that uses refined historical foreground masks and historical visual features of the target to provide comprehensive and precise prompts for the tracker. We build a novel tracker called HIPTrack based on the historical prompt network, which achieves considerable performance improvements without the need to retrain the entire model. We conduct experiments on seven datasets and experimental results demonstrate that our method surpasses the current state-of-the-art trackers on LaSOT, LaSOText, GOT-10k and NfS. Furthermore, the historical prompt network can seamlessly integrate as a plug-and-play module into existing trackers, providing performance enhancements. The source code is available at https://github.com/WenRuiCai/HIPTrack.
