Table of Contents
Fetching ...

UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network

Siyuan Yao, Yang Guo, Yanyang Yan, Wenqi Ren, Xiaochun Cao

TL;DR

UncTrack tackles unreliable target localization in visual object tracking by explicitly predicting localization uncertainty and exploiting it through a prototype memory network. The approach combines a transformer encoder, an uncertainty-aware localization decoder, and a prototype memory network that updates with high-confidence samples to maintain robust target representations across time. Two-stage training and online memory updating enable strong performance under occlusion, deformation, motion blur, and clutter, achieving state-of-the-art results on multiple benchmarks. By modeling uncertainty and incorporating historical prototypes, UncTrack offers improved reliability for real-time tracking in safety-critical applications.

Abstract

Transformer-based trackers have achieved promising success and become the dominant tracking paradigm due to their accuracy and efficiency. Despite the substantial progress, most of the existing approaches tackle object tracking as a deterministic coordinate regression problem, while the target localization uncertainty has been greatly overlooked, which hampers trackers' ability to maintain reliable target state prediction in challenging scenarios. To address this issue, we propose UncTrack, a novel uncertainty-aware transformer tracker that predicts the target localization uncertainty and incorporates this uncertainty information for accurate target state inference. Specifically, UncTrack utilizes a transformer encoder to perform feature interaction between template and search images. The output features are passed into an uncertainty-aware localization decoder (ULD) to coarsely predict the corner-based localization and the corresponding localization uncertainty. Then the localization uncertainty is sent into a prototype memory network (PMN) to excavate valuable historical information to identify whether the target state prediction is reliable or not. To enhance the template representation, the samples with high confidence are fed back into the prototype memory bank for memory updating, making the tracker more robust to challenging appearance variations. Extensive experiments demonstrate that our method outperforms other state-of-the-art methods. Our code is available at https://github.com/ManOfStory/UncTrack.

UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network

TL;DR

UncTrack tackles unreliable target localization in visual object tracking by explicitly predicting localization uncertainty and exploiting it through a prototype memory network. The approach combines a transformer encoder, an uncertainty-aware localization decoder, and a prototype memory network that updates with high-confidence samples to maintain robust target representations across time. Two-stage training and online memory updating enable strong performance under occlusion, deformation, motion blur, and clutter, achieving state-of-the-art results on multiple benchmarks. By modeling uncertainty and incorporating historical prototypes, UncTrack offers improved reliability for real-time tracking in safety-critical applications.

Abstract

Transformer-based trackers have achieved promising success and become the dominant tracking paradigm due to their accuracy and efficiency. Despite the substantial progress, most of the existing approaches tackle object tracking as a deterministic coordinate regression problem, while the target localization uncertainty has been greatly overlooked, which hampers trackers' ability to maintain reliable target state prediction in challenging scenarios. To address this issue, we propose UncTrack, a novel uncertainty-aware transformer tracker that predicts the target localization uncertainty and incorporates this uncertainty information for accurate target state inference. Specifically, UncTrack utilizes a transformer encoder to perform feature interaction between template and search images. The output features are passed into an uncertainty-aware localization decoder (ULD) to coarsely predict the corner-based localization and the corresponding localization uncertainty. Then the localization uncertainty is sent into a prototype memory network (PMN) to excavate valuable historical information to identify whether the target state prediction is reliable or not. To enhance the template representation, the samples with high confidence are fed back into the prototype memory bank for memory updating, making the tracker more robust to challenging appearance variations. Extensive experiments demonstrate that our method outperforms other state-of-the-art methods. Our code is available at https://github.com/ManOfStory/UncTrack.

Paper Structure

This paper contains 24 sections, 16 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: A general introduction to our approach. (a) The uncertainty curves of the top-left and bottom-right corners predicted by the proposed UncTrack in challenging video sequence, the object occlusion raises significant localization uncertainty. The green and the blue boxes denote the ground-truth and the predicted bounding boxes, respectively. (b) The comparative tracking results of our method with other state-of-the-art trackers.
  • Figure 2: The overall architecture of UncTrack, which consists of a transformer encoder, an uncertainty-aware localization decoder (ULD) and a prototype memory network (PMN). The paired template-search images are sent into the transformer encoder to capture the discriminative features. The encoded template-search features are further passed into ULD. The output uncertainty heatmap is transformed by the confidence inversion module (CIM) and combined with the online updated tokens in PMN, allowing UncTrack to construct reliable target-specific representation as a prototype memory bank for target state estimation.
  • Figure 3: Comparison of the proposed algorithm and several state-of-the-art trackers on LaSOT benchmark. We evaluate the precision and success rate using one-pass evaluation (OPE).
  • Figure 4: Comparison of the proposed algorithm and several state-of-the-art trackers on UAV123 benchmark. We evaluate distance precision and overlap success plots over 123 sequences using one-pass evaluation (OPE).
  • Figure 5: Comparison of the proposed algorithm and several state-of-the-art trackers on OTB benchmark. We evaluate distance precision and overlap success plots over 100 sequences using one-pass evaluation (OPE).
  • ...and 6 more figures