Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023
Hongpeng Pan, Yang Yang, Zhongtian Fu, Yuxuan Zhang, Shian Du, Yi Xu, Xiangyang Ji
TL;DR
The paper tackles robust single-point tracking in TAP-Vid under zero-shot conditions, focusing on static-camera scenarios where TAPIR exhibits static-point jitter and drift. It introduces TAPIR+ with two key components: Multi-granularity Camera Motion Detection to classify camera motion and Confident Moving Region (CMR)-based trajectory prediction to stabilize static points, while preserving TAPIR outputs for moving cameras. Through zero-shot experiments on MOVi-F, TAPIR+ achieves substantial improvements over TAPIR, particularly for static-camera videos (AJ increase of about 2.8) and attains state-of-the-art results in the final test (AJ up to $47.19$ static, $45.78$ moving). This approach enhances robustness and generalization for single-point tracking across camera motions, offering a practical, scalable solution for TAP-Vid-like perception tasks.
Abstract
This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera. To clarify, our approach contains two key components: (1) Multi-granularity Camera Motion Detection, which could identify the video sequence by the static camera shot. (2) CMR-based point trajectory prediction with one moving object segmentation approach to isolate the static point from the moving object. Our approach ranked first in the final test with a score of 0.46.
