OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking
Zekun Qian, Ruize Han, Wei Feng, Junhui Hou, Linqi Song, Song Wang
TL;DR
This work introduces Open-Corpus MOT (OCMOT), a practical extension of multi-object tracking that localizes, associates, and generatively recognizes objects from both seen base classes and unseen novel classes without predefined category lists. It establishes OCTrackB, a large-scale benchmark built from TAO and LV-VIS to ensure base/novel diversity, rich sampling, and semantic compatibility, along with a multi-granularity recognition metric mgReA and the composite TRETA score for evaluation. A baseline method, OCTracker, combines a class-agnostic detector (Deformable DETR), a generative recognition head (FlanT5-base), and a two-stage association learning pipeline to tackle open-corpus recognition. Experimental results across diverse baselines demonstrate that while localization and tracking are strong with existing MOT approaches, open-corpus recognition remains challenging, and the proposed evaluation framework effectively highlights improvements from open-vocabulary and generative recognition strategies. The OCTrackB benchmark and mgReA/TRETA metrics provide a practical foundation for research on open-world MOT and the broader deployment of trackers in real-world, taxonomy-rich environments.
Abstract
We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without the category text list as prompt. To study this problem, the top priority is to build a benchmark. In this work, we build OCTrackB, a large-scale and comprehensive benchmark, to provide a standard evaluation platform for the OCMOT problem. Compared to previous datasets, OCTrackB has more abundant and balanced base/novel classes and the corresponding samples for evaluation with less bias. We also propose a new multi-granularity recognition metric to better evaluate the generative object recognition in OCMOT. By conducting the extensive benchmark evaluation, we report and analyze the results of various state-of-the-art methods, which demonstrate the rationale of OCMOT, as well as the usefulness and advantages of OCTrackB.
