Table of Contents
Fetching ...

Improving Accuracy and Generalization for Efficient Visual Tracking

Ram Zaveri, Shivang Patel, Yu Gu, Gianfranco Doretto

TL;DR

This work targets efficient visual tracking with strong generalization to out-of-distribution sequences. It introduces SiamABC, a lightweight Siamese tracker leveraging a dual-template and dual-search-region bridge, a Fast Mixed Filtration module, and Transitive Relation Loss to better align temporal representations, plus a backward-free Dynamic Test-Time Adaptation to adjust to target appearance shifts during inference. Empirically, SiamABC-Tiny achieves 100 FPS on CPU and outperforms MixFormerV2-S on the challenging AVisT OOD benchmark by 7.6% in AUC, while SiamABC-Small maintains strong accuracy with high throughput across ID and OOD benchmarks. The combination of architectural innovations and efficient online adaptation enables robust, real-time tracking suitable for resource-constrained, in-the-wild deployments, with code and models publicly available.

Abstract

Efficient visual trackers overfit to their training distributions and lack generalization abilities, resulting in them performing well on their respective in-distribution (ID) test sets and not as well on out-of-distribution (OOD) sequences, imposing limitations to their deployment in-the-wild under constrained resources. We introduce SiamABC, a highly efficient Siamese tracker that significantly improves tracking performance, even on OOD sequences. SiamABC takes advantage of new architectural designs in the way it bridges the dynamic variability of the target, and of new losses for training. Also, it directly addresses OOD tracking generalization by including a fast backward-free dynamic test-time adaptation method that continuously adapts the model according to the dynamic visual changes of the target. Our extensive experiments suggest that SiamABC shows remarkable performance gains in OOD sets while maintaining accurate performance on the ID benchmarks. SiamABC outperforms MixFormerV2-S by 7.6\% on the OOD AVisT benchmark while being 3x faster (100 FPS) on a CPU. Our code and models are available at https://wvuvl.github.io/SiamABC/.

Improving Accuracy and Generalization for Efficient Visual Tracking

TL;DR

This work targets efficient visual tracking with strong generalization to out-of-distribution sequences. It introduces SiamABC, a lightweight Siamese tracker leveraging a dual-template and dual-search-region bridge, a Fast Mixed Filtration module, and Transitive Relation Loss to better align temporal representations, plus a backward-free Dynamic Test-Time Adaptation to adjust to target appearance shifts during inference. Empirically, SiamABC-Tiny achieves 100 FPS on CPU and outperforms MixFormerV2-S on the challenging AVisT OOD benchmark by 7.6% in AUC, while SiamABC-Small maintains strong accuracy with high throughput across ID and OOD benchmarks. The combination of architectural innovations and efficient online adaptation enables robust, real-time tracking suitable for resource-constrained, in-the-wild deployments, with code and models publicly available.

Abstract

Efficient visual trackers overfit to their training distributions and lack generalization abilities, resulting in them performing well on their respective in-distribution (ID) test sets and not as well on out-of-distribution (OOD) sequences, imposing limitations to their deployment in-the-wild under constrained resources. We introduce SiamABC, a highly efficient Siamese tracker that significantly improves tracking performance, even on OOD sequences. SiamABC takes advantage of new architectural designs in the way it bridges the dynamic variability of the target, and of new losses for training. Also, it directly addresses OOD tracking generalization by including a fast backward-free dynamic test-time adaptation method that continuously adapts the model according to the dynamic visual changes of the target. Our extensive experiments suggest that SiamABC shows remarkable performance gains in OOD sets while maintaining accurate performance on the ID benchmarks. SiamABC outperforms MixFormerV2-S by 7.6\% on the OOD AVisT benchmark while being 3x faster (100 FPS) on a CPU. Our code and models are available at https://wvuvl.github.io/SiamABC/.

Paper Structure

This paper contains 20 sections, 7 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Comparison of our trackers with others on the AVisT noman2022avist dataset on a CPU. We show the success score (AUC) (vertical axis), speed (horizontal axis), and relative number of FLOPs (circles) of the trackers. Our trackers outperform other efficient trackers in terms of both speed and accuracy.
  • Figure 2: Overall Architecture. The Feature Extraction Block uses a readily available backbone to process the frames. The Relation-Aware Block exploits representational relations among the dual-template and dual-search-region through our losses, $\mathcal{L}_{TR}$ and $\mathcal{L}_{Reg}$, where dual-template and dual-search-region representations are obtained via our learnable FMF layer. The Heads Block learns lightweight convolution layers to infer the bounding box and the classification score through standard tracking losses, $\mathcal{L}_{IoU}$ and $\mathcal{L}_{FL}$ respectively. During inference, the tracker adapts to every instance through our Dynamic Test-Time Adaptation framework.
  • Figure 3: Fast Mixed Filtration. This block serves as a lightweight and effective attention mechanism. The input $x$ is filtered to produce the compressed representations $\bold{\check{x}}$. The broadcast and element-wise operations make this block efficient on CPU.
  • Figure 4: Ablation study on the components of SiamABC-Tiny. Top-row: Ablation on the FMF block. Middle-row: Ablation on TRL losses. Bottom-row: Ablation on squeeze rate.
  • Figure 5: Qualitative comparison on the AVisT noman2022avist dataset with other efficient trackers, and with the further inclusion of Ocean. Under adverse visibility conditions, our tracker, S-Tiny, is relatively stable compared to the others while running at 100 FPS on a CPU.
  • ...and 1 more figures