Towards Generalizable Multi-Object Tracking
Zheng Qin, Le Wang, Sanping Zhou, Panpan Fu, Gang Hua, Wei Tang
TL;DR
The paper tackles the generalization gap in MOT by identifying key scenario attributes that influence tracker performance and proposing GeneralTrack, a point-wise to instance-wise relation framework. GeneralTrack uses a multi-scale point-region relation and hierarchical aggregation to avoid manually balancing motion and appearance across diverse scenes, enabling robust cross-scenario tracking. It achieves state-of-the-art results on multiple benchmarks (notably 1st on BDD100K with 57.87 mTETA) and demonstrates strong domain generalization without dataset-specific tuning. The work combines thorough attribute analysis, end-to-end relational modeling, and comprehensive ablations to validate the approach and points toward future work on multi-frame relations.
Abstract
Multi-Object Tracking MOT encompasses various tracking scenarios, each characterized by unique traits. Effective trackers should demonstrate a high degree of generalizability across diverse scenarios. However, existing trackers struggle to accommodate all aspects or necessitate hypothesis and experimentation to customize the association information motion and or appearance for a given scenario, leading to narrowly tailored solutions with limited generalizability. In this paper, we investigate the factors that influence trackers generalization to different scenarios and concretize them into a set of tracking scenario attributes to guide the design of more generalizable trackers. Furthermore, we propose a point-wise to instance-wise relation framework for MOT, i.e., GeneralTrack, which can generalize across diverse scenarios while eliminating the need to balance motion and appearance. Thanks to its superior generalizability, our proposed GeneralTrack achieves state-of-the-art performance on multiple benchmarks and demonstrates the potential for domain generalization. https://github.com/qinzheng2000/GeneralTrack.git
