NetTrack: Tracking Highly Dynamic Objects with a Net
Guangze Zheng, Shijie Lin, Haobo Zuo, Changhong Fu, Jia Pan
TL;DR
NetTrack addresses the challenge of tracking highly dynamic open-world objects by introducing a fine-grained Net that uses points of interest for robust association and a fine-grained object-text grounding module for precise localization. The approach blends a fine-grained sampler and matching with grounding-based, open-vocabulary prompts, enabling strong generalization without fine-tuning across diverse benchmarks. The authors propose the Bird Flock Tracking (BFT) dataset to stress-test dynamicity and demonstrate state-of-the-art performance on BFT along with strong zero-shot transfer to TAO, TAO-OW, AnimalTrack, and GMOT-40. These results illustrate the potential of fine-grained learning to enhance open-world MOT, with practical implications for ecological inspection, video editing, and descriptor-guided tracking workflows.
Abstract
The complex dynamicity of open-world objects presents non-negligible challenges for multi-object tracking (MOT), often manifested as severe deformations, fast motion, and occlusions. Most methods that solely depend on coarse-grained object cues, such as boxes and the overall appearance of the object, are susceptible to degradation due to distorted internal relationships of dynamic objects. To address this problem, this work proposes NetTrack, an efficient, generic, and affordable tracking framework to introduce fine-grained learning that is robust to dynamicity. Specifically, NetTrack constructs a dynamicity-aware association with a fine-grained Net, leveraging point-level visual cues. Correspondingly, a fine-grained sampler and matching method have been incorporated. Furthermore, NetTrack learns object-text correspondence for fine-grained localization. To evaluate MOT in extremely dynamic open-world scenarios, a bird flock tracking (BFT) dataset is constructed, which exhibits high dynamicity with diverse species and open-world scenarios. Comprehensive evaluation on BFT validates the effectiveness of fine-grained learning on object dynamicity, and thorough transfer experiments on challenging open-world benchmarks, i.e., TAO, TAO-OW, AnimalTrack, and GMOT-40, validate the strong generalization ability of NetTrack even without finetuning. Project page: https://george-zhuang.github.io/nettrack/.
