Table of Contents
Fetching ...

ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking

Tingyang Zhang, Chen Wang, Zhiyang Dou, Qingzhe Gao, Jiahui Lei, Baoquan Chen, Lingjie Liu

TL;DR

ProTracker addresses the challenge of long-term, arbitrary-point video tracking by unifying short-term, local optical flow with long-term, global correspondences through a probabilistic integration framework. A hybrid filter prunes unreliable predictions, and bidirectional flow integration aggregates multiple noisy estimates to produce drift-resistant trajectories. The method further enhances robustness by jointly incorporating long-term keypoints, enabling re-localization after disappearance and improved occlusion handling. Across TAP-Vid and BADJA benchmarks, ProTracker achieves state-of-the-art performance among optimization-based trackers and competitive results against supervised methods, illustrating strong practical impact for drift-free, dense point tracking in dynamic scenes.

Abstract

We propose ProTracker, a novel framework for accurate and robust long-term dense tracking of arbitrary points in videos. Previous methods relying on global cost volumes effectively handle large occlusions and scene changes but lack precision and temporal awareness. In contrast, local iteration-based methods accurately track smoothly transforming scenes but face challenges with occlusions and drift. To address these issues, we propose a probabilistic framework that marries the strengths of both paradigms by leveraging local optical flow for predictions and refined global heatmaps for observations. This design effectively combines global semantic information with temporally aware low-level features, enabling precise and robust long-term tracking of arbitrary points in videos. Extensive experiments demonstrate that ProTracker attains state-of-the-art performance among optimization-based approaches and surpasses supervised feed-forward methods on multiple benchmarks. The code and model will be released after publication.

ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking

TL;DR

ProTracker addresses the challenge of long-term, arbitrary-point video tracking by unifying short-term, local optical flow with long-term, global correspondences through a probabilistic integration framework. A hybrid filter prunes unreliable predictions, and bidirectional flow integration aggregates multiple noisy estimates to produce drift-resistant trajectories. The method further enhances robustness by jointly incorporating long-term keypoints, enabling re-localization after disappearance and improved occlusion handling. Across TAP-Vid and BADJA benchmarks, ProTracker achieves state-of-the-art performance among optimization-based trackers and competitive results against supervised methods, illustrating strong practical impact for drift-free, dense point tracking in dynamic scenes.

Abstract

We propose ProTracker, a novel framework for accurate and robust long-term dense tracking of arbitrary points in videos. Previous methods relying on global cost volumes effectively handle large occlusions and scene changes but lack precision and temporal awareness. In contrast, local iteration-based methods accurately track smoothly transforming scenes but face challenges with occlusions and drift. To address these issues, we propose a probabilistic framework that marries the strengths of both paradigms by leveraging local optical flow for predictions and refined global heatmaps for observations. This design effectively combines global semantic information with temporally aware low-level features, enabling precise and robust long-term tracking of arbitrary points in videos. Extensive experiments demonstrate that ProTracker attains state-of-the-art performance among optimization-based approaches and surpasses supervised feed-forward methods on multiple benchmarks. The code and model will be released after publication.
Paper Structure (21 sections, 13 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 13 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Results of tracking a single object. While DINO-Tracker may mispredict parts onto similar objects and TAPIR can be disrupted by similar patterns, our method avoids these errors.
  • Figure 2: Pipeline overview of our proposed method. (1) Sample & Chain: Key points are initially sampled and linked through optical flow chaining to produce preliminary trajectory predictions. (2) Long-term Correspondence: Key points are re-localized over longer time spans to maintain continuity, even for points that temporarily disappear. (3) Hybrid Filter: Masks and feature filters are applied to remove incorrect predictions, reducing noise for subsequent steps. (4) Probabilistic Integration: Filtered flow predictions across frames are first integrated and then combined with long-term keypoint to produce the final prediction, producing smoother and more consistent trajectories.
  • Figure 2: Results of tracking a single object. While DINO-Tracker may lose some parts and TAPIR can be disrupted by multiple similar patterns, our method avoids these errors.
  • Figure 3: Bidirectional Probabilistic Flow Integration. Top row: Optical flow effectively tracks a point in the short term but may fail under occlusion due to its local nature. Bottom row: Long-term correspondence aids in globally relocating the target when the tracked point reappears. Once relocation is achieved, optical flow can resume tracking in the surrounding frames.
  • Figure 3: Results of tracking at a higher frame rate. Sliding window based methods can easily lose track after occlusion and drift due to accumulating errors, while ours exhibit robustness.
  • ...and 3 more figures