Table of Contents
Fetching ...

Leveraging Object Priors for Point Tracking

Bikram Boote, Anh Thai, Wenqi Jia, Ozgur Kara, Stefan Stojanov, James M. Rehg, Sangmin Lee

TL;DR

This work tackles long-term point tracking by addressing the common failure of points drifting off their object. It introduces objectness regularization, trained with ground-truth object masks, to bias points to stay inside object boundaries, eliminating the need for masks at test time. Coupled with a contextual attention module that enriches local features with neighborhood context, the method builds on the PIPs++ framework to produce more instance-aware trajectories. Experiments on PointOdyssey, TAP-Vid-DAVIS, and CroHD demonstrate state-of-the-art performance and robust, efficient tracking, with ablations confirming complementary gains from both proposed components.

Abstract

Point tracking is a fundamental problem in computer vision with numerous applications in AR and robotics. A common failure mode in long-term point tracking occurs when the predicted point leaves the object it belongs to and lands on the background or another object. We identify this as the failure to correctly capture objectness properties in learning to track. To address this limitation of prior work, we propose a novel objectness regularization approach that guides points to be aware of object priors by forcing them to stay inside the the boundaries of object instances. By capturing objectness cues at training time, we avoid the need to compute object masks during testing. In addition, we leverage contextual attention to enhance the feature representation for capturing objectness at the feature level more effectively. As a result, our approach achieves state-of-the-art performance on three point tracking benchmarks, and we further validate the effectiveness of our components via ablation studies. The source code is available at: https://github.com/RehgLab/tracking_objectness

Leveraging Object Priors for Point Tracking

TL;DR

This work tackles long-term point tracking by addressing the common failure of points drifting off their object. It introduces objectness regularization, trained with ground-truth object masks, to bias points to stay inside object boundaries, eliminating the need for masks at test time. Coupled with a contextual attention module that enriches local features with neighborhood context, the method builds on the PIPs++ framework to produce more instance-aware trajectories. Experiments on PointOdyssey, TAP-Vid-DAVIS, and CroHD demonstrate state-of-the-art performance and robust, efficient tracking, with ablations confirming complementary gains from both proposed components.

Abstract

Point tracking is a fundamental problem in computer vision with numerous applications in AR and robotics. A common failure mode in long-term point tracking occurs when the predicted point leaves the object it belongs to and lands on the background or another object. We identify this as the failure to correctly capture objectness properties in learning to track. To address this limitation of prior work, we propose a novel objectness regularization approach that guides points to be aware of object priors by forcing them to stay inside the the boundaries of object instances. By capturing objectness cues at training time, we avoid the need to compute object masks during testing. In addition, we leverage contextual attention to enhance the feature representation for capturing objectness at the feature level more effectively. As a result, our approach achieves state-of-the-art performance on three point tracking benchmarks, and we further validate the effectiveness of our components via ablation studies. The source code is available at: https://github.com/RehgLab/tracking_objectness
Paper Structure (25 sections, 5 equations, 4 figures, 6 tables)

This paper contains 25 sections, 5 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: (a) shows the example where the points leave the object and fails to return to the location on the original object by missing the object. (b) describes the concept of objectness regularization. Although Pred2 is a worse case than Pred1 because it misses the object, the conventional distance loss fails to distinguish such cases. To address this, we further penalize the out-of-object points to improve awareness of objectness during the training time.
  • Figure 2: Overall framework of our approach at training time. The model consists mainly of feature extraction, iterative inference, and objectness regularization. Contextual attention in the feature extraction improves the representation to better distinguish individual objects by encoding the neighborhood contexts for local regions. The objectness regularization guides the tracked points to stay inside the object by penalizing out-of-object points.
  • Figure 3: Qualitative results demonstrating the benefits of our approach. The examples show cases where our approach tracks the points on each object consistently well.
  • Figure 4: Qualitative results on the TAP-Vid-DAVIS tap dataset. Ours can effectively track points in various scenarios with occlusion, motion blur and changing orientations