UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network
Siyuan Yao, Yang Guo, Yanyang Yan, Wenqi Ren, Xiaochun Cao
TL;DR
UncTrack tackles unreliable target localization in visual object tracking by explicitly predicting localization uncertainty and exploiting it through a prototype memory network. The approach combines a transformer encoder, an uncertainty-aware localization decoder, and a prototype memory network that updates with high-confidence samples to maintain robust target representations across time. Two-stage training and online memory updating enable strong performance under occlusion, deformation, motion blur, and clutter, achieving state-of-the-art results on multiple benchmarks. By modeling uncertainty and incorporating historical prototypes, UncTrack offers improved reliability for real-time tracking in safety-critical applications.
Abstract
Transformer-based trackers have achieved promising success and become the dominant tracking paradigm due to their accuracy and efficiency. Despite the substantial progress, most of the existing approaches tackle object tracking as a deterministic coordinate regression problem, while the target localization uncertainty has been greatly overlooked, which hampers trackers' ability to maintain reliable target state prediction in challenging scenarios. To address this issue, we propose UncTrack, a novel uncertainty-aware transformer tracker that predicts the target localization uncertainty and incorporates this uncertainty information for accurate target state inference. Specifically, UncTrack utilizes a transformer encoder to perform feature interaction between template and search images. The output features are passed into an uncertainty-aware localization decoder (ULD) to coarsely predict the corner-based localization and the corresponding localization uncertainty. Then the localization uncertainty is sent into a prototype memory network (PMN) to excavate valuable historical information to identify whether the target state prediction is reliable or not. To enhance the template representation, the samples with high confidence are fed back into the prototype memory bank for memory updating, making the tracker more robust to challenging appearance variations. Extensive experiments demonstrate that our method outperforms other state-of-the-art methods. Our code is available at https://github.com/ManOfStory/UncTrack.
