SRRT: Exploring Search Region Regulation for Visual Object Tracking

Jiawen Zhu; Xin Chen; Pengyu Zhang; Xinying Wang; Dong Wang; Wenda Zhao; Huchuan Lu

SRRT: Exploring Search Region Regulation for Visual Object Tracking

Jiawen Zhu, Xin Chen, Pengyu Zhang, Xinying Wang, Dong Wang, Wenda Zhao, Huchuan Lu

TL;DR

The paper tackles the rigidity of fixed-size search regions in visual object tracking, which can hamper performance under fast motion or distractor interference. It introduces SRRT, a dynamic paradigm that uses a Search Region Regulator (SRR) to predict an optimal per-frame search radius $ ilde{oldsymbol{ ilde{ y}}}$ and a locking-state-based update to refresh the dynamic reference when needed, enabling flexible, robust tracking. The method demonstrates consistent gains across eight benchmarks, notably achieving +4.6% and +3.1% improvements in AUC over strong baselines on LaSOT, and delivering state-of-the-art results on several datasets while preserving real-time speed. SRRT is designed as a plug-and-play enhancement that can be integrated with existing trackers with minimal overhead, broadening applicability in real-world tracking scenarios.

Abstract

The dominant trackers generate a fixed-size rectangular region based on the previous prediction or initial bounding box as the model input, i.e., search region. While this manner obtains promising tracking efficiency, a fixed-size search region lacks flexibility and is likely to fail in some cases, e.g., fast motion and distractor interference. Trackers tend to lose the target object due to the limited search region or experience interference from distractors due to the excessive search region. Drawing inspiration from the pattern humans track an object, we propose a novel tracking paradigm, called Search Region Regulation Tracking (SRRT) that applies a small eyereach when the target is captured and zooms out the search field when the target is about to be lost. SRRT applies a proposed search region regulator to estimate an optimal search region dynamically for each frame, by which the tracker can flexibly respond to transient changes in the location of object occurrences. To adapt the object's appearance variation during online tracking, we further propose a lockingstate determined updating strategy for reference frame updating. The proposed SRRT is concise without bells and whistles, yet achieves evident improvements and competitive results with other state-of-the-art trackers on eight benchmarks. On the large-scale LaSOT benchmark, SRRT improves SiamRPN++ and TransT with absolute gains of 4.6% and 3.1% in terms of AUC. The code and models will be released.

SRRT: Exploring Search Region Regulation for Visual Object Tracking

TL;DR

and a locking-state-based update to refresh the dynamic reference when needed, enabling flexible, robust tracking. The method demonstrates consistent gains across eight benchmarks, notably achieving +4.6% and +3.1% improvements in AUC over strong baselines on LaSOT, and delivering state-of-the-art results on several datasets while preserving real-time speed. SRRT is designed as a plug-and-play enhancement that can be integrated with existing trackers with minimal overhead, broadening applicability in real-world tracking scenarios.

Abstract

Paper Structure (15 sections, 6 equations, 10 figures, 7 tables)

This paper contains 15 sections, 6 equations, 10 figures, 7 tables.

Introduction
Related Work
Visual Object Tracking
Search Region Generation
Search Region Regulation Tracking
Conventional Search Region Generation
Learning Search Region Regulation
Search Region Regulator
SRRT Pipeline
Experiments
Implementation Details
State-of-the-art Comparison
Ablation Studies
Limitation
Conclusion

Figures (10)

Figure 1: The paradigm of existing approaches (a) and the proposed (b). Existing tracking approaches adopt a fixed-size search region for target object capturing. In contrast, the proposed SRRT paradigm has a dynamic field of view, enabling the tracker a more robust and flexible tracking capability.
Figure 2: Minimum search region size distribution statistics of adjacent frames. '$N$$SR$': search region of $N^2$ times of previous object area. Across all above benchmarks, small search region ($2$$SR$) occupies a very high proportion, while extraordinarily large search region exists at the same time.
Figure 3: Overview of SRRT paradigm. Search Region Regulator (SRR) sweeps through the candidate region against initial and dynamic reference frames to generate a prediction of the desired search radius. Afterward, the search region of the current frame is cropped according to the predicted search radius, and the search region and template patches are fed to the corresponding SR-dedicated tracker to obtain the final tracking results.
Figure 4: Intuitive presentation of locking-state determined update strategy. During online tracking at the proposed SRRT paradigm, if the SRR assigns a minimum search region (e.g., $2^2$ times in this work) for the tracker in consecutive $K$ frames, we term it target-locking and the current frame will be updated as the dynamic reference frame. This strategy allows the model to gain enhanced adaptability to the changes in the target appearance without introducing additional update network modules.
Figure 5: State-of-the-art comparison on OTB100 and NFS. SRRT consistently improves the performance of baselines. Best viewed in color with zoom-in.
...and 5 more figures

SRRT: Exploring Search Region Regulation for Visual Object Tracking

TL;DR

Abstract

SRRT: Exploring Search Region Regulation for Visual Object Tracking

Authors

TL;DR

Abstract

Table of Contents

Figures (10)