Table of Contents
Fetching ...

OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking

Zekun Qian, Ruize Han, Wei Feng, Junhui Hou, Linqi Song, Song Wang

TL;DR

This work introduces Open-Corpus MOT (OCMOT), a practical extension of multi-object tracking that localizes, associates, and generatively recognizes objects from both seen base classes and unseen novel classes without predefined category lists. It establishes OCTrackB, a large-scale benchmark built from TAO and LV-VIS to ensure base/novel diversity, rich sampling, and semantic compatibility, along with a multi-granularity recognition metric mgReA and the composite TRETA score for evaluation. A baseline method, OCTracker, combines a class-agnostic detector (Deformable DETR), a generative recognition head (FlanT5-base), and a two-stage association learning pipeline to tackle open-corpus recognition. Experimental results across diverse baselines demonstrate that while localization and tracking are strong with existing MOT approaches, open-corpus recognition remains challenging, and the proposed evaluation framework effectively highlights improvements from open-vocabulary and generative recognition strategies. The OCTrackB benchmark and mgReA/TRETA metrics provide a practical foundation for research on open-world MOT and the broader deployment of trackers in real-world, taxonomy-rich environments.

Abstract

We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without the category text list as prompt. To study this problem, the top priority is to build a benchmark. In this work, we build OCTrackB, a large-scale and comprehensive benchmark, to provide a standard evaluation platform for the OCMOT problem. Compared to previous datasets, OCTrackB has more abundant and balanced base/novel classes and the corresponding samples for evaluation with less bias. We also propose a new multi-granularity recognition metric to better evaluate the generative object recognition in OCMOT. By conducting the extensive benchmark evaluation, we report and analyze the results of various state-of-the-art methods, which demonstrate the rationale of OCMOT, as well as the usefulness and advantages of OCTrackB.

OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking

TL;DR

This work introduces Open-Corpus MOT (OCMOT), a practical extension of multi-object tracking that localizes, associates, and generatively recognizes objects from both seen base classes and unseen novel classes without predefined category lists. It establishes OCTrackB, a large-scale benchmark built from TAO and LV-VIS to ensure base/novel diversity, rich sampling, and semantic compatibility, along with a multi-granularity recognition metric mgReA and the composite TRETA score for evaluation. A baseline method, OCTracker, combines a class-agnostic detector (Deformable DETR), a generative recognition head (FlanT5-base), and a two-stage association learning pipeline to tackle open-corpus recognition. Experimental results across diverse baselines demonstrate that while localization and tracking are strong with existing MOT approaches, open-corpus recognition remains challenging, and the proposed evaluation framework effectively highlights improvements from open-vocabulary and generative recognition strategies. The OCTrackB benchmark and mgReA/TRETA metrics provide a practical foundation for research on open-world MOT and the broader deployment of trackers in real-world, taxonomy-rich environments.

Abstract

We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without the category text list as prompt. To study this problem, the top priority is to build a benchmark. In this work, we build OCTrackB, a large-scale and comprehensive benchmark, to provide a standard evaluation platform for the OCMOT problem. Compared to previous datasets, OCTrackB has more abundant and balanced base/novel classes and the corresponding samples for evaluation with less bias. We also propose a new multi-granularity recognition metric to better evaluate the generative object recognition in OCMOT. By conducting the extensive benchmark evaluation, we report and analyze the results of various state-of-the-art methods, which demonstrate the rationale of OCMOT, as well as the usefulness and advantages of OCTrackB.
Paper Structure (14 sections, 7 figures, 2 tables)

This paper contains 14 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Illustration of the open-vocabulary and open-corpus multi-object tracking.
  • Figure 2: Statistics and comparison of the object categories appearing in the datasets.
  • Figure 3: Normalized entropy of different units.
  • Figure 4: Statistics of the videos, track, and objects for base/novel classes in different datasets.
  • Figure 5: Illustration of the multi-granularity evaluation metric.
  • ...and 2 more figures