TOPIC: A Parallel Association Paradigm for Multi-Object Tracking under Complex Motions and Diverse Scenes
Xiaoyan Cao, Yiyao Zheng, Yao Yao, Huapeng Qin, Xiaoyu Cao, Shihui Guo
TL;DR
This work addresses MOT in scenarios with complex, diverse motions by introducing a parallel association paradigm (TOPIC) that jointly leverages motion and appearance cues. It also provides BEE24, a bee-focused dataset with long sequences, small objects, occlusion, and highly variable motion to challenge existing trackers. TOPIC is implemented with a two-round matching mechanism that adaptively selects between appearance- and motion-based matches based on a motion-level threshold, and is enhanced by the Attention-based Appearance Reconstruction Module (AARM) to improve identity discrimination without additional training. Across five datasets, including BEE24, TOPICTrack achieves state-of-the-art results and demonstrates significant reductions in false negatives, while ablation studies confirm the effectiveness of both TOPIC and AARM in improving robustness for complex motions and diverse scenes.
Abstract
Video data and algorithms have been driving advances in multi-object tracking (MOT). While existing MOT datasets focus on occlusion and appearance similarity, complex motion patterns are widespread yet overlooked. To address this issue, we introduce a new dataset called BEE24 to highlight complex motions. Identity association algorithms have long been the focus of MOT research. Existing trackers can be categorized into two association paradigms: single-feature paradigm (based on either motion or appearance feature) and serial paradigm (one feature serves as secondary while the other is primary). However, these paradigms are incapable of fully utilizing different features. In this paper, we propose a parallel paradigm and present the Two rOund Parallel matchIng meChanism (TOPIC) to implement it. The TOPIC leverages both motion and appearance features and can adaptively select the preferable one as the assignment metric based on motion level. Moreover, we provide an Attention-based Appearance Reconstruction Module (AARM) to reconstruct appearance feature embeddings, thus enhancing the representation of appearance features. Comprehensive experiments show that our approach achieves state-of-the-art performance on four public datasets and BEE24. Moreover, BEE24 challenges existing trackers to track multiple similar-appearing small objects with complex motions over long periods, which is critical in real-world applications such as beekeeping and drone swarm surveillance. Notably, our proposed parallel paradigm surpasses the performance of existing association paradigms by a large margin, e.g., reducing false negatives by 6% to 81% compared to the single-feature association paradigm. The introduced dataset and association paradigm in this work offer a fresh perspective for advancing the MOT field. The source code and dataset are available at https://github.com/holmescao/TOPICTrack.
